OpenAI has reached a deal with Reddit to use the social news siteโs data for training AI models.
In a blog post on OpenAIโs press relations site, the company said that the Reddit partnership will provide it access to โreal-time, structured and unique contentโ โ e.g. posts and replies โ from Reddit, allowing its tools and models to โbetter understand and showcaseโ that content. Reddit content will be incorporated into ChatGPT, OpenAIโs popular conversational AI, and the companies will work together to bring unspecified new โAI-powered featuresโ to both Reddit users and moderators.
OpenAI will also become a Reddit advertising partner.
โReddit will be building on OpenAIโs platform of AI models to bring its powerful vision to life,โ OpenAI wrote in the post. โUsing LLMs, ML, and AI allow Reddit to improve the user experience for everyone.โ
OpenAI has several similar licensing deals with content providers ranging from stock media libraries to news publishers. But the unusual angle to this one is that Sam Altman, OpenAIโs CEO, has an 8.7% stake in Reddit, making him the third-largest shareholder, and was once a member of the companyโs board of directors.
In an attempt to discourage scrutiny, OpenAI says in its press release that, while Altman remains a Reddit shareholder, the partnership โwas led by OpenAIโs COO [Brad Lightcap]โ and โapproved by [OpenAIโs] independent board of directors.โ (Iโll note here that Altman is a member of OpenAIโs board; he recused himself for this decision, however, an OpenAI spokesperson tells TechCrunch.)
Reddit has made data licensing agreements an increasingly central part of its growth strategy as it navigates the market as a public company.
In its IPO prospectus, Reddit revealed that it has contractual agreements to license its data to customers including Google worth a combined over $200 million. And, in its first earnings report as a public company, Reddit reported a 450% year-over-year increase in non-ad revenue, attributable mainly to those agreements.
Reddit stock was up 11% in extended trading following the announcement of the OpenAI deal.
โThe paradox I see is that, as more content on the internet is written by machines, thereโs an increasing premium on content that comes from real people,โ Reddit CEO Steve Huffman said during the companyโs earnings call in March. โAnd we have nearly two decades of authentic conversation.โ
Redditโs platform โ which has over 1 billion posts and more than 16 billion comments, figures that grow every day thanks to its hundreds of millions of active users โ is a gold mine for generative AI companies, whose models learn from examples of content, like text and images, to generate new, similar content.
But the company could face pushback from users concerned about how itโs monetizing their data.
Itโs instructive to look at Stack Overflow, the Q&A forum for software developers, which recently inked an agreement with OpenAI to supply data for the latterโs model training. In protest, some users deleted their top-rated answers to questions on the community. But Stack Overflow restored the deleted posts and banned those users, claiming that they werenโt in compliance with its terms of service.
Reddit has already voiced its displeasure with one attempt to afford Reddit users greater control over their own data.
Vana, a startup built on the blockchain, is attempting to launch a data โDAOโ (Digital Autonomous Organization) to let Reddit users pool their data and let them decide together how that combined dataโs used (or sold). Reddit banned Vanaโs subreddit dedicated to discussion about the DAO, in a statement to TechCrunch, and accused the company of โexploitingโ its data export controls.
Weโre launching an AI newsletter! Sign up here to start receiving it in your inboxes on June 5.