OpenAI Signs Agreement to Train AI Using Reddit Data | TechCrunch - Latest Global News

OpenAI Signs Agreement to Train AI Using Reddit Data | TechCrunch

OpenAI and Reddit have reached an agreement that allows OpenAI to train its generative AI models on Reddit’s data.

In a blog post on OpenAI’s press site, OpenAI said that through a newly formed partnership with Reddit it would provide access to “structured and unique content in real time” – e.g. B. Posts and replies – received from Reddit and will enable its tools and models to “better understand and present” Reddit content. Reddit content will be integrated into ChatGPT, OpenAI’s AI-powered chatbot platform, and OpenAI will work with Reddit to provide unspecified new “AI-powered features” to both Reddit users and moderators.

OpenAI will also become a Reddit advertising partner.

“Reddit will build on OpenAI’s AI modeling platform to bring its powerful vision to life,” OpenAI wrote in the post. “By leveraging LLMs, ML, and AI, Reddit can improve the user experience for everyone.”

OpenAI has entered into several similar licensing agreements with content providers ranging from media libraries to news publishers. What’s unusual, however, is that OpenAI CEO Sam Altman owns an 8.7% stake in Reddit, making him the third-largest shareholder, and was once a member of the company’s board.

To avoid further scrutiny, OpenAI says in its press release that while Altman remains a Reddit shareholder, the partnership “was led by OpenAI’s COO.” [Brad Lightcap]“ and “approved by [OpenAI’s] independent board.” (I would like to note that Altman himself is a member of the OpenAI board.)

Reddit has made data licensing agreements an increasingly central part of its growth strategy as it moves into the market as a publicly traded company.

In its IPO prospectus, Reddit disclosed that it has contractual agreements to license its data to customers, including Google, worth a total of over $200 million. And in its first earnings report as a public company, Reddit reported a 450% year-over-year increase in non-advertising revenue, largely due to these agreements.

Reddit shares rose 11% in extended trading following the OpenAI deal announcement.

“The paradox I see is that as more content on the internet is written by machines, there is more and more content that comes from real people,” Reddit CEO Steve Huffman said during the company’s March earnings call. “And we have almost two decades of authentic conversations.”

Reddit’s platform – which has over 1 billion posts and more than 16 billion comments, numbers that are growing daily thanks to its hundreds of millions of weekly active users – is a goldmine for generative AI companies, whose models learn from examples of content to Generate new content such as text and images.

But the company could face resistance from users concerned about how it monetizes their data.

It’s instructive to take a look at Stack Overflow, the question-and-answer forum for software developers, which recently signed an agreement with OpenAI to provide data for its model training. In protest, some users deleted their top-rated answers to questions from the community. But Stack Overflow restored the deleted posts and banned those users, saying they had not complied with the terms of service.

Reddit has already expressed its displeasure with an attempt to give Reddit users more control over their own data.

Vana, a blockchain-based startup, is trying to launch a data “DAO” (Digital Autonomous Organization) to give Reddit users the ability to pool their data and collectively decide how that combined data is used (or sold. Reddit banned Vana’s subreddit dedicated to discussing the DAO in a statement to TechCrunch, accusing the company of “exploiting” its data export controls.

Sharing Is Caring:

Leave a Comment