Microsoft is being sued by writers for allegedly using pirated books to train an AI model

Microsoft is being sued by a group of authors who claim that the company used their illegally obtained books to train its Megatron AI.

Microsoft is facing legal action from a group of authors who claim the tech giant illegally used their copyrighted books to train its Megatron AI model. 

The complaint, which was submitted to a federal court in New York on Tuesday, claims that Microsoft used a dataset of almost 200,000 digital books that had been pirated to develop and train the AI system without the authors’ permission.

Jia Tolentino, Daniel Okrent, and Kai Bird are among the plaintiffs. They contend that Microsoft’s AI model effectively produces derivative content from stolen intellectual property by imitating the “syntax, voice, and themes” of their original works.

The case is the most recent in a series of legal actions brought by writers, publishers, and copyright holders against well-known tech firms like Meta, Anthropic, and OpenAI. Many of these companies are charged with using creative works as a means of creating generative AI tools without authorization or payment.

The complaint claims that Microsoft’s Megatron model was trained to respond to prompts in a manner similar to that of a human using stolen texts. The authors contend that this training procedure diminishes the value of their original work in addition to violating their copyrights. They are requesting statutory damages of up to $150,000 for each infringed work as well as a court injunction to prevent Microsoft from using their content going forward.

The lawsuit was filed the day after a federal judge in California rendered the first significant US decision on AI and copyright, holding that even though AI firms like Anthropic might be permitted to use copyrighted content under the “fair use” doctrine, they could still be held accountable if the works were obtained unlawfully.

Microsoft is a major investor in OpenAI and has been rapidly growing its AI capabilities through products integrated into its Office and Azure platforms. The company has not yet responded to the lawsuit. The writers’ lawyers, meanwhile, chose not to comment on the case.

For a long time, tech companies have maintained that using copyrighted content for AI training is acceptable as fair use, particularly when the models that are created result in novel and revolutionary content. However, detractors claim that these actions amount to systematic exploitation of creative labor and endanger the livelihoods of journalists, artists, and writers.

As courts start to debate the legal limits of machine learning and content creation, the verdict in this case could have a significant impact on how intellectual property is handled in the era of artificial intelligence.

Add a Comment

Your email address will not be published.