Technology

Anthropic Is Being Sued by Authors for Training Its Chatbot on Their Copyrighted Books

  • A group of authors is suing Anthropic for using pirated versions of hundreds of copyrighted books for training its AI chatbot Claude.
  • The authors demand an immediate cease of using their work. Monetary compensation has not been discussed yet.
  • Anthropic has yet to address the lawsuit.

Anthropic Sued For Training Its Chatbot On Copyrighted Books

AI startup Anthropic is being sued by a group of authors for allegedly training its AI chatbot Claude on pirated versions of their copyrighted books.

The books were taken from a dataset called “The Pile” which contains a significant portion called Book3, which in turn contains a large collection of pirated ebooks, including works of Stephen King and Micheal Pollan.

Authors Andrea Bartz, Charles Graeber, and Kirk Wallace Johnson, who are representing the group of authors from both fiction and non-fiction genres filed the lawsuit on Monday in a federal court in San Francisco and accused the company of committing large-scale theft.

“It is no exaggeration to say that Anthropic’s model seeks to profit from strip-mining the human expression and ingenuity behind each one of those works,” the lawsuit added.

What Do the Writers Want?

For now, the authors just want Anthropic to stop stealing their work. Whether they also want compensation for the work that has already been used or not is unclear.

But they did mention that Anthropic not only stole their work without compensation but actively took steps to hide the full extent of its theft.

In addition to this lawsuit, the company is also dealing with a separate legal battle against some major publishers that have accused Claude of regurgitating the lyrics of copyrighted songs.

Now, copyright lawsuits like these are nothing new in the AI industry. OpenAI has faced countless similar lawsuits in the past.

  • It all started when Sarah Silverman, along with authors Christopher Golden and Richard Kadrey, sued OpenAI And Meta for training their AI models on datasets that included their work
  • The New York Times also sued OpenAI and Microsoft, in December last year, for using their journalistic content for AI training without seeking permission. The newspaper demanded compensation for such use.
  • Following this, 8 newspapers owned by Alden Global Capital, sued OpenAI and Microsoft in April 2024, for unauthorized publication usage for training.
  • Nvidia was also sued for using copyrighted work for training NeMo AI in March 2024.

But what makes Anthropic’s instance different is that that company has always marketed itself as a more responsible and safer AI model. So lawsuits like these clearly aren’t good for its brand image.

What Does the Law Say about Copyrighted Work?

Anthropic hasn’t released an official statement yet. But in the past, when the question about whether AI models should use copyrighted content for training purposes arose, Anthropic, like many other AI companies, saw nothing wrong with it.

These companies feel using copyrighted work for their models falls under the “fair use” doctrine of U.S. laws that permits the use of copyrighted materials in certain special scenarios such as for research, teaching, or transforming the copyrighted work into something different.

Now it’s true that in some cases, US laws allow the use of copyrighted work under “fair use”. However, there are four principles that determine if the use of copyrighted material is fair or not:

1. The intent of use – The use of copyrighted work should be for a purpose different from the author’s. For example, the author had written a book for revenue and artistic purposes, whereas Anthropic used it for AI training. Hence, the intent of use criteria is fulfilled in this case.

2. Nature of data – If the data used for training is factual in nature, it is more likely to fall under ‘fair use’. However, using creative work for AI training may not fall under ‘free use’. This is still a grey area and may depend from case to case.

3. Extent and significance of data used – The use of data should be done with a transformative purpose. Plus, there should be complete disclosure of data used by the AI company. This is where Anthropic may struggle to prove its claims. Since the books it used were scoured from pirated sources and there wasn’t any public disclosure, establishing ‘fair use’ can be difficult.

4. Impact on the value of copyrighted work – The owner should not suffer loss due to the use of copyrighted work and the value of their work should not deteriorate. Now, this is a difficult argument to prove. After AI models have been trained on these books, they become capable of reproducing them and replicating their style of work. This may lead to a loss of revenue and market value for the authors.

The thin line between fair use and copyright will have to be eliminated if courts have to make a concrete decision on the usage of copyrighted material for AI training. For now, we will have to wait and see how Anthropic responds.

The Tech Report - Editorial ProcessOur Editorial Process

The Tech Report editorial policy is centered on providing helpful, accurate content that offers real value to our readers. We only work with experienced writers who have specific knowledge in the topics they cover, including latest developments in technology, online privacy, cryptocurrencies, software, and more. Our editorial policy ensures that each topic is researched and curated by our in-house editors. We maintain rigorous journalistic standards, and every article is 100% written by real authors.

Related Articles

Back to top button