Apple has been hit with a proposed class action lawsuit accusing the tech giant of using pirated books to train its AI models. Filed in a federal court in Northern California, the complaint alleges that Apple trained its OpenELM large language models on copyrighted materials without obtaining permission or offering compensation to the authors.
The plaintiffs, authors Grady Hendrix and Jennifer Roberson, argue that their books were included in a dataset known to contain pirated content. The lawsuit claims Apple made no effort to notify or pay the creators. This case adds to a growing number of legal challenges involving how AI systems are being trained on copyrighted content.
AI model training practices under scrutiny
The dispute centers around the training process of OpenELM, Apple’s open-source family of language models introduced earlier this year. The models are said to have been trained on large text corpora, including datasets that may contain copyrighted books shared without permission.
Apple has not issued a public comment on the case. The company’s silence is consistent with how most AI developers are currently responding to mounting concerns about data sourcing practices. Meanwhile, the case raises broader questions about the ethical and legal frameworks surrounding AI development.
A wave of lawsuits across the industry
Apple is not the only company facing legal action. AI startup Anthropic recently agreed to a $1.5 billion settlement in a similar lawsuit filed by a group of authors. The company was accused of using books to train its Claude chatbot without consent. Microsoft, Meta, and OpenAI have also been named in lawsuits related to copyright infringement in the context of AI training.
These developments point to a rapidly escalating legal front in the battle over intellectual property rights in the AI age. Authors, publishers, and content creators are increasingly pushing back against the idea that publicly accessible data can be freely used to train commercial AI models.
As AI continues to evolve and expand into mainstream use, legal clarity around data sourcing and content rights is becoming urgent. The outcome of these cases could help define the boundaries of fair use and shape how AI systems are developed in the future.
