In a landmark ruling for the generative AI industry, a U.S. federal judge has declared that Anthropic’s use of books to train its Claude large language model (LLM) qualifies as “fair use” under U.S. copyright law. The ruling, handed down by District Judge William Alsup in San Francisco, is the first of its kind to address how AI training intersects with copyright protections—and it sets a crucial precedent for how AI companies might defend their data sourcing practices.
Fair use affirmed—but with caveats
The case was brought by authors Andrea Bartz, Charles Graeber, and Kirk Wallace Johnson, who alleged that Anthropic used pirated versions of their works without permission. The judge found that while Anthropic’s training of its model using these books was indeed transformative—and therefore permissible under the fair use doctrine—the storage of more than 7 million pirated books in what was described as a “central library” did infringe on the authors’ copyrights.
This nuanced ruling means Anthropic will face a trial in December 2025 to determine financial damages related to that storage violation. Under U.S. law, statutory damages can reach up to $150,000 per work if infringement is deemed willful.
Why this matters for the AI industry
At stake is the legal foundation upon which many generative AI systems are built. Companies like Anthropic, OpenAI, and Meta have used large volumes of existing human-created content—including books, news articles, and websites—to train models that produce text, code, and images. Many of these companies claim their use of such data is protected as fair use because their AI models generate new, transformative outputs.
Judge Alsup agreed, noting that Anthropic’s AI didn’t replicate or plagiarize the authors’ works, but learned from them in a way similar to how a human writer studies style and content. “Like any reader aspiring to be a writer,” he wrote, “Anthropic’s LLMs trained upon works not to replicate them—but to turn a hard corner and create something different.”
Also read: Getty Sues Stability AI Over Copyright Breach
Storage of pirated content raises red flags
Despite this endorsement of AI training practices, the court took issue with how Anthropic acquired and stored the source material. Alsup questioned why the company had relied on pirated digital copies when lawful alternatives were available. He noted that the creation of a massive digital archive of books, outside of immediate training needs, could not be justified as fair use.
This part of the ruling could have broader implications. Many AI firms are under scrutiny for acquiring content from dubious sources, and this case draws a line between transformative AI use and outright copyright infringement in terms of data handling.
Next steps: balancing innovation with legal compliance
While this ruling offers the AI sector a legal defense for data training, it also highlights the need for more ethical and transparent practices around data sourcing. Companies must now consider not just how they use data, but how they obtain and manage it.
As legal frameworks evolve, the industry’s challenge will be to continue innovating without crossing into legal grey zones—a tightrope act that could determine the pace and shape of AI development in the years ahead.
(Credit: Reuters)
