This Week in AI: Generative AI and the Creator Compensation Problem

Keeping up with an industry as fast-moving as AI is a major challenge. Until an AI can do it for you, here’s a handy roundup of the latest stories from the world of machine learning, as well as notable research and experiments that we haven’t covered alone.

By the way – TechCrunch plans to publish an AI newsletter soon. Stay tuned.

This week in AI, eight prominent U.S. newspapers owned by investment giant Alden Global Capital, including the New York Daily News, Chicago Tribune and Orlando Sentinel, sued OpenAI and Microsoft for copyright infringement related to the companies’ use of generative AI technology. Like the New York Times in its ongoing lawsuit against OpenAI, they accuse OpenAI and Microsoft of stealing their intellectual property without permission or compensation to develop and commercialize generative models like GPT-4.

“We have spent billions of dollars collecting information and reporting news in our publications, and we cannot allow OpenAI and Microsoft to expand the big tech playbook of stealing our work to build their own businesses at our expense “Frank Pine, the editor in chief who oversees Alden’s newspapers, said in a statement.

Given OpenAI’s existing partnerships with publishers and the company’s reluctance to make its entire business model dependent on the fair use argument, it seems likely that the lawsuit will end in a settlement and licensing agreement. But what about the rest of the content creators whose works are put into model training without payment?

Apparently OpenAI is thinking about it.

A recently published research article co-authored by Boaz Barak, a scientist on OpenAI’s Superalignment team, proposes a framework to compensate copyright holders “proportionally to their contributions to the creation of AI-generated content.” How? Through cooperative game theory.

The framework evaluates the extent to which content in a training data set – e.g. Text, images, or other data—influence the generation of a model, using a game theory concept known as the Shapley value. Then, based on this assessment, the “rightful share” (i.e. compensation) of the content owners is determined.

Let’s say you have an image-generating model that was trained on artwork from four artists: John, Jacob, Jack, and Jebediah. They ask him to draw a flower in Jack’s style. The framework allows you to determine the influence of each artist’s works on the art produced by the model and therefore the compensation each should receive.

There Is However, one disadvantage of the framework is that it is computationally intensive. Researchers’ workarounds are based on compensation estimates rather than precise calculations. Would this satisfy content creators? I’m not sure. If OpenAI puts it into practice one day, we will certainly find out.

Here are some other notable AI stories from recent days:

Microsoft reiterates ban on facial recognition: Language has been added to the terms of service for Azure OpenAI Service, Microsoft’s fully managed wrapper for OpenAI technology, that more clearly prohibits integrations from being used “by or for” police departments for facial recognition in the United States
The Nature of AI Native Startups: AI startups face different challenges than a typical software-as-a-service company. That was the message from Rudina Seseri, founder and managing partner of Glasswing Ventures, last week at the TechCrunch Early Stage Event in Boston; Ron has the whole story.
Anthropic presents a business plan: AI startup Anthropic is launching a new paid plan for businesses and a new iOS app. Team – the enterprise plan – provides customers with higher priority access to Anthropic’s Claude 3 family of generative AI models, as well as additional administrative and user management controls.
CodeWhisperer no longer: Amazon CodeWhisperer is now a Q Developera part of Amazon’s Q family of business-focused generative AI chatbots. Available on AWS, Q Developer helps developers with some of the tasks they do in their daily work, such as debugging and updating apps, similar to CodeWhisperer.
Just leave Sam’s Club: Walmart-owned Sam’s Club says it’s relying on AI to accelerate its “exit technology.” Instead of requiring store staff to check members’ purchases against their receipts as they leave a store, Sam’s Club customers who pay either at a checkout or through the Scan & Go mobile app can now leave certain store locations without their Purchases should be checked again.
Fish harvest, automated: Harvesting fish is an inherently messy affair. Shinkei is working to improve it with an automated system that ships fish more humanely and reliably, which could lead to a completely different seafood economy, Devin reports.
Yelp’s AI Assistant: Yelp this week announced a new AI-powered chatbot for consumers — based on OpenAI models, according to the company — that will help them connect with relevant businesses for their tasks (e.g. installing lighting fixtures, upgrading outdoor areas, etc.). The company is rolling out the AI assistant in its iOS app under the Projects tab and plans to expand it to Android later this year.

More machine learning

Photo credit: US Department of Energy

It sounds like quite a party took place at Argonne National Lab this winter, as a hundred experts from the AI and energy sectors gathered to talk about how the rapidly evolving technology is driving the nation’s infrastructure and research and development could be helpful in this area. The resulting report is more or less what you would expect from this audience: a lot of futuristic but still informative.

Looking at nuclear energy, the grid, carbon management, energy storage and materials, the themes that emerged from this meeting were, first, that researchers need access to powerful computational tools and resources; second, learn to recognize the weaknesses of the simulations and predictions (including those enabled by the first thing); Third, there is a need for AI tools that can integrate and access data from multiple sources and in many formats. We’ve seen all of these things in different ways throughout the industry, so it’s not a huge surprise, but nothing gets done at the federal level without a few geniuses putting out an article, so it’s good to have it on the record.

Georgia Tech and Meta are working on part of this with a large new database called OpenDAC, a stack of reactions, materials and calculations designed to help scientists make carbon capture processes easier. The focus is on metal-organic frameworks, a promising and popular material type for carbon capture, but one that has thousands of variations that have not yet been widely tested.

The Georgia Tech team partnered with Oak Ridge National Lab and Metas FAIR to simulate quantum chemical interactions on these materials. Around 400 million computing hours were required – far more than a university can easily manage. Hopefully it will help climate scientists working in this field. It’s all documented here.

We hear a lot about AI applications in the medical field, although most have a sort of advisory role, helping experts notice things they might not otherwise have seen or recognize patterns that would have taken a technician hours to identify to find. This is partly because these machine learning models simply find connections between statistics without understanding what caused what or led to what. Researchers from Cambridge and the Ludwig Maximilian University of Munich are working on this, as overcoming fundamental correlational relationships could be extremely helpful in creating treatment plans.

The aim of the work, led by Professor Stefan Feuerriegel from LMU, is to create models that can identify causal mechanisms and not just correlations: “We give the machine rules to recognize the causal structure and correctly formalize the problem.” Then the machine has to learn to recognize the effects of interventions and, so to speak, understand how the real consequences are reflected in the data that is fed into the computers,” he said. It’s early days for them and they are aware of it, but they believe their work is part of an important decade-long development period.

Over at the University of Pennsylvania, graduate student Ro Encarnación is working on a new angle in the field of “algorithmic justice,” which has been pioneered (mostly by women and people of color) over the last seven to eight years. Her work focuses more on the users than the platforms and documents what she calls “emergent auditing.”

What do users do when Tiktok or Instagram publishes a somewhat racist filter or an image generator that causes something sensational? They may complain, but they continue to use it and learn to work around or even exacerbate the problems it encodes. It may not be a “solution” as we imagine, but it shows the diversity and resilience of the user side of the equation – it is not as fragile or passive as you might think.

More machine learning

Leave a Comment Cancel reply