Databricks Faces Lawsuit Over Copyrighted LLM Training Data

Global AI Watch·29 April 2026·4 min read·The Register

Key Takeaways

1Databricks sued over LLM claiming copyright infringement of 196,000 titles.
2Legal landscape shifts as court allows authors' lawsuit to proceed.
3Impacts Databricks' reliance on external data sources for AI training.

Databricks is currently facing a class action lawsuit filed by a group of authors, including notable bestsellers, alleging that its language model, DBRX, was trained on copyrighted material from about 196,000 book titles. U.S. District Judge Charles Breyer denied Databricks' motion to dismiss the lawsuit, allowing the authors to pursue their claims that the company acquired this data unlawfully, particularly from a dataset called RedPajama, which was previously removed from Hugging Face for copyright violations.

The case raises significant questions about the data sourcing practices for LLMs and could set a precedent for copyright issues in AI development. With the court proceedings, Databricks may need to re-evaluate its AI training data acquisitions, raising concerns about dependency on external datasets potentially leading to further legal ramifications. This situation not only highlights the challenges faced by AI companies in complying with copyright laws but also underscores an increasing scrutiny regarding data sovereignty in AI training practices.

Source

The Registerhttps://go.theregister.com/feed/www.theregister.com/2026/04/29/databricks_author_copyright_lawsuit_continues/

Read original

Explore Trackers

Sovereign AI IndexCountry-by-country rankings Global AI Activity MapLive regional intelligence

Databricks Faces Lawsuit Over Copyrighted LLM Training Data

Key Takeaways

Related Sovereign AI Articles

House Panel Increases Funding for NSF and NASA

CSET Experts Focus on Intent Behind AI Regulation

Elon Musk Challenges OpenAI's Commitment to AI Ethics

Musk Reflects on OpenAI Funding as a Misstep

Canonical Faces Backlash Over AI Features in Ubuntu

Explore Trackers