Saturday, February 22, 2025
HomeTechnologyMeta Employees Discussed Using Copyrighted Content for AI Training, Court Documents Reveal

Meta Employees Discussed Using Copyrighted Content for AI Training, Court Documents Reveal

Court documents that were unsealed on Thursday reveal that Meta employees have been discussing the use of copyrighted works, obtained through potentially questionable legal means, to train the company’s AI models. These documents were submitted by the plaintiffs in the ongoing case of Kadrey v. Meta, one of several AI-related copyright disputes currently progressing through the U.S. court system. Meta maintains that training models on intellectual property-protected works, especially books, falls under “fair use.” However, authors such as Sarah Silverman and Ta-Nehisi Coates, who are among the plaintiffs, contest this claim.

Previously submitted materials in the lawsuit suggested that Meta CEO Mark Zuckerberg gave approval for Meta’s AI team to use copyrighted content for training. It was also suggested that Meta paused talks regarding AI training data licensing with book publishers. The newly submitted filings, which include parts of internal work chats between Meta employees, provide a clearer understanding of how Meta might have utilized copyrighted data to train its models, including those belonging to Meta’s Llama series.

In one documented conversation, Meta employees, including Melanie Kambadur, a senior manager for Meta’s Llama model research team, talked about the legal intricacies of training models on certain works. Xavier Martinet, a Meta research engineer, expressed a viewpoint of “ask forgiveness, not for permission,” suggesting that books be acquired and the matter be escalated to executives for decision-making. Martinet noted that the AI organization was established to allow more risk-taking. He also proposed purchasing e-books at retail prices for training rather than negotiating licensing deals with book publishers. Despite concerns raised by another staff member about potential legal challenges due to using unauthorized, copyrighted materials, Martinet compared this practice to what other startups were likely already doing.

In the same chat, Kambadur mentioned that Meta was negotiating with document hosting platform Scribd for licenses and highlighted the less conservative stance of Meta’s legal team regarding approvals for using “publicly available data” in model training.

The filings also reveal discussions within Meta about using Libgen, a platform often associated with providing access to copyrighted works without authorization. Despite Libgen’s history of legal issues, some Meta decision-makers believed that not using Libgen could negatively impact Meta’s competitiveness in AI development. In an email to Meta AI VP Joelle Pineau, Sony Theakanath, director of product management at Meta, described Libgen as essential for achieving state-of-the-art benchmark performance. He also outlined strategies to mitigate legal risks, such as avoiding public disclosure of the use of Libgen datasets and removing files marked as “pirated” or “stolen.”

Further internal chats revealed that Meta’s AI team had been programmed to avoid responding to IP-sensitive prompts, such as requests to reproduce parts of copyrighted texts or disclose specific training data. The documents also suggest that Meta may have used Reddit data for training purposes, potentially by emulating the actions of a third-party app called Pushshift. This comes in light of Reddit’s plan announced in April 2023 to charge AI companies for data access.

In a March 2024 chat, Chaya Nayak, director of product management for Meta’s generative AI unit, indicated that Meta’s leadership might reconsider past decisions on training datasets to ensure adequate data for training models. This includes content from platforms such as Facebook and Instagram, but it was stated these sources alone were insufficient.

The plaintiffs in the Kadrey v. Meta case have revised their allegations multiple times since filing the lawsuit in 2023. The latest claims include that Meta cross-referenced certain pirated books with those available for licensing to evaluate the benefit of pursuing licensing agreements. Reflecting the case’s significance, Meta has bolstered its defense team with two Supreme Court litigators from the firm Paul Weiss. Meta has not yet responded to requests for comment.

Source link

DMN8 Partners
DMN8 Partnershttps://salvonow.com/
DMN8 Partners utilizes a strategy of Cross Channel marketing including local search engine optimization, PPC, messaging and hyper-targeted audiences allow our clients to experience results and ROI that fuel growth and expansion in their operations. There are a lot of digital marketing options across the country but partnering with an agency that understands multiple touches on multiple platforms allows your company’s message to be seen at the perfect time, on the perfect platform, by your perfect prospect. DMN8 Partners has had years of experience growing businesses. Start growing your business today and begin DOMINATE-ing your market.
RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Most Popular

Recent Comments