Technology

Meta torrented & seeded 81.7 TB dataset containing copyrighted data

Emails discussing torrenting prove that Meta knew it was “illegal,” authors alleged. And Bashlykov’s warnings seemingly landed on deaf ears, with authors alleging that evidence showed Meta chose to instead hide its torrenting as best it could while downloading and seeding terabytes of data from multiple shadow libraries as recently as April 2024.

Meta allegedly concealed seeding

Supposedly, Meta tried to conceal the seeding by not using Facebook servers while downloading the dataset to “avoid” the “risk” of anyone “tracing back the seeder/downloader” from Facebook servers, an internal message from Meta researcher Frank Zhang said, while describing the work as in “stealth mode.” Meta also allegedly modified settings “so that the smallest amount of seeding possible could occur,” a Meta executive in charge of project management, Michael Clark, said in a deposition.

Now that new information has come to light, authors claim that Meta staff involved in the decision to torrent LibGen must be deposed again, because allegedly the new facts “contradict prior deposition testimony.”

Mark Zuckerberg, for example, claimed to have no involvement in decisions to use LibGen to train AI models. But unredacted messages show the “decision to use LibGen occurred” after “a prior escalation to MZ,” authors alleged.

Meta did not immediately respond to Ars’ request for comment and has maintained throughout the litigation that AI training on LibGen was “fair use.”

However, Meta has previously addressed its torrenting in a motion to dismiss filed last month, telling the court that “plaintiffs do not plead a single instance in which any part of any book was, in fact, downloaded by a third party from Meta via torrent, much less that Plaintiffs’ books were somehow distributed by Meta.”

While Meta may be confident in its legal strategy despite the new torrenting wrinkle, the social media company has seemingly complicated its case by allowing authors to expand the distribution theory that’s key to winning a direct copyright infringement claim beyond just claiming that Meta’s AI outputs unlawfully distributed their works.

As limited discovery on Meta’s seeding now proceeds, Meta is not fighting the seeding aspect of the direct copyright infringement claim at this time, telling the court that it plans to “set… the record straight and debunk… this meritless allegation on summary judgment.”

Related Articles

Back to top button