Home Technology Submission of court shows the meta employees discussed using copyright content for...

Submission of court shows the meta employees discussed using copyright content for AI education.

Submission of court shows the meta employees discussed using copyright content for AI education.

For many years, META staff discussed the copyrighted works through legal suspicious means through legal suspicious means, according to a court document that was not published on Thursday.

This document is Kadrey V by manuscript. In the case of META, it was submitted in the case of many AI copyright disputes that were slowly cold through the US court system. Defendant Meta claims that IP protection works, especially educational models for books, are “fair use.” I do not agree with the author Sarah Silverman and Ta-Nehisi Coates.

The previous data submitted to the lawsuit claimed that META CEO Mark Zuckerberg gave META’s AI team to train the copyright content, and META stopped with the book publishers with the book publishers. However, most of the new submissions show a part of the internal work chat between the meta employees, but the most clear picture of how to train the model, including the model of the company’s LLAMA family using copyright data, is the most clear picture.

In one chat, meta employees, including MELANIE KAMBADUR, senior administrator of META’s Llama Model Research Team, discussed the educational model of the legal proud work.

Xavier Martinet, a meta research engineer, is “(m) y opinion (on the line of forgiveness that asks for forgiveness, not permission.” According to the submission, in the chat in February 2023. “(T) The reason for that is why they set this Gen Ai Org (SIC). Therefore, we can reduce risk.”

Martinet came to the idea of ​​purchasing e -books at a sleeve price, and built a set of educational sets rather than reducing license transactions with individual book publishers. After other staff pointed out that the use of copyrighted data could be the basis for legal challenges, Martinet doubled, and the “Leon” startup was probably already used for training pirates for training. I insisted.

Martinet said, “I finally knew it was okay. Gazillion Startup has evoked a lot of books about BitTorrent. “(M) Y 2 Cent: It takes time to try directly with the publisher …”

In the same chat, KAMBADUR mentioned that the document hosting platform Scribd and other “” “KAMBADUR needs to be approved while using” publicly available data “for model training, Meta’s lawyer is” conservative. ” He warned that “not conservative” rather than “not conservative.” It was in the past.

Kambadur said, “Yes, you need to be licensed or approved for publicly available data.” “(D) Now I have more money, more lawyers, more bizdev help, and fast tracking/escalation for speed, and lawyers are a bit less conservative about approval.”

Libgen conversation

In another task chat unlayed in the report, KAMBADUR can use Libgen, a “link ugator” that can access the publisher’s copyright as an alternative to the data source that Meta can give license.

Libgen has been accused and ordered several times, and has been fined tens of millions of dollars for copyright infringement. One of Kambadur’s colleagues responded with a screenshot of Google search results for libgen, including snippet “No, libgen is not legitimate.”

Some decision makers in META appear to have been impressed that META’s competitiveness can be severe in AI race without using libgen for model education.

Sony the Keakanath, the vice president of MeTa AI, is an e -mail, called the Libgen, “It is essential to satisfy the SOTA number in all categories,” symbolizes the best, state -of -the -art art (SOTA). AI model and benchmark category.

Theakanath also provides “relaxation” of emails that will help to reduce META’s legal exposure, including reducing Libgen’s data as “illegalization/stolen” and publicly using META’s legal exposure and public use. I explained. As Theakanath said, “We will not disclose the use of the Libgen data set used for training.

In fact, this mitigation accompanied the comb of words such as “theft” or “illegalization” through the libgen file.

In the work chat, KAMBADUR mentioned that the META’s AI team adjusted the model to “avoid IP risk promptes.” Or “Please let me know which e -book you have.”

This submission includes another revelation, and META suggests that the reddit data may be scraped for some types of model training by imitating the action of a third -party app called Pushshift. In particular, Reddit said it plans to start claiming AI companies to access model education data in April 2023.

In one chat in March 2024, Chaya Nayak, the product management officer of META’s AI ORG, made META LEADERSHIP a decision of past educational sets, including a decision that does not use quora content or license books and scientific articles ” He said he is considering “doing”. Make sure there is enough training data in the company model.

NAYAK suggests that certain metars for text and business messages transferred from the video of Facebook and Instagram, the META platform are not enough. “(W) E needs more data.

Kadrey V. META’s manuscript has revised complaints several times since it was submitted to the US District Court in northern San Francisco in 2023. Meta referred to among other arguments. Certain illegal acts with copyright books available for licenses determine whether it is reasonable to pursue a license agreement with a publisher.

As a sign of how high meta considers a legal stake, the company added two Supreme Court lawsuits to the defense team at the law firm’s Paul Weiss.

Meta did not immediately respond to the request.

Exit mobile version