Databricks Faces Lawsuit Over Copyrighted LLM Training Data

Global AI Watch·29 April 2026·4 min read·The Register

Databricks is currently facing a class action lawsuit filed by a group of authors, including notable bestsellers, alleging that its language model, DBRX, was trained on copyrighted material from about 196,000 book titles. U.S. District Judge Charles Breyer denied Databricks' motion to dismiss the lawsuit, allowing the authors to pursue their claims that the company acquired this data unlawfully, particularly from a dataset called RedPajama, which was previously removed from Hugging Face for copyright violations.

The case raises significant questions about the data sourcing practices for LLMs and could set a precedent for copyright issues in AI development. With the court proceedings, Databricks may need to re-evaluate its AI training data acquisitions, raising concerns about dependency on external datasets potentially leading to further legal ramifications. This situation not only highlights the challenges faced by AI companies in complying with copyright laws but also underscores an increasing scrutiny regarding data sovereignty in AI training practices.

Source

The Registerhttps://go.theregister.com/feed/www.theregister.com/2026/04/29/databricks_author_copyright_lawsuit_continues/

Read original

Explore Trackers

Sovereign AI IndexCountry-by-country rankings Global AI Activity MapLive regional intelligence

Databricks Faces Lawsuit Over Copyrighted LLM Training Data

Related Sovereign AI Articles

Canonical Faces Backlash Over AI Features in Ubuntu

Robinhood CEO Predicts Tokenization Supercycle Impact

North Carolina Proposes Bill for Data Center Cost Coverage

Tumbler Ridge Families Sue OpenAI Over ChatGPT Incident

Ukraine Launches Domestic Defense Weapon Exports

Explore Trackers