Home
tech
news
Meta used copyright to protect its AI model, but argues against the law for everyone else

Meta used copyright to protect its AI model, but argues against the law for everyone else

Kali Hays

Meta tried to use copyright law to get a version of its Llama AI model removed from GitHub.
Meanwhile, Meta argues fiercely against other copyright holders using those same protections.

Meta argues strongly that copyright law shouldn't apply when online content is being used for free to build AI models. Unless the content in question belongs to Meta.

The company formerly known as Facebook is investing heavily in AI, releasing models and generative AI tools to catch up with the explosive popularity of OpenAI's ChatGPT.

Meta has joined Big Tech cohorts like Google and Microsoft in arguing to the US Copyright Office that the mountain of copyrighted text, imagery, and data scraped for free and used to train AI models is not protected under copyright law. Meta thinks effectively that everything available on the internet falls under "fair use," because AI models like Llama do not exploit or reproduce copyrighted works. (Although they, in fact, very often, do).

However, a few months before pushing this copyright stance, Meta attempted to argue in favor of broader copyright protections for Llama.

Meta's takedown request

In early 2023, an initial version of Llama leaked online, leading the large language model and its specifications to be torrented and then posted to GitHub, an online coding website owned by Microsoft.

Meta sent GitHub a demand that it immediately take down or "cease access" to the model, according to a copy of the request that GitHub hosts on its site. The takedown request was noticed by Franklin Graves, a lawyer and author of a newsletter about the creator economy, who posted recently to X about Meta's request.

"We have a good faith belief that use of the Meta Properties materials described above on the website is not authorized by the copyright owner, its agent, or the law," Meta argued in the request.

The takedown notice was submitted through the Digital Millennium Copyright Act, or DMCA, a law that extended the reach of copyright law for the internet age. It passed in 1998, almost 6 years before Facebook was founded.

A not-funny joke

"No one is authorized to exhibit, reproduce, transmit, or otherwise distribute Meta Properties without the express written permission of Meta," the company wrote in its DMCA letter to GitHub.

"Now... who is going to be the first to make a joke about the irony of Meta using copyright to protect its LLM?" Graves wrote on X.

The irony is blatant. Meta does not ask for authorization from millions of authors, artists, writers and other content creators when it uses their online creations to train and build Llama. When Meta's content is being treated the same way, then it's somehow illegal.

A Meta spokesman declined to comment. In comments to the US Copyright Office, the company's argument is something along the lines of — well training large language models is too hard without using everyone else's data for free and without permission.

As it told the USCO late last year, Meta thinks it's "impossible for AI developers to license the rights" to all of the copyrighted information needed to build LLMs. And yet, the models created from this information should be protected by copyright, according to Meta's letter to GitHub.

A failed attempt

Meta's attempt to get the early Llama model removed from the developer platform failed in the end.

Although GitHub did initially remove it, the GitHub user who posted the Llama details disputed the action, saying the model specifications at issue, referred to as "weights," did "not have sufficient originality to be copyrightable" because "they were copied from the works used to train Llama" and did not involve human selection.

The Llama repository remains up, and Meta subsequently released new versions of Llama as mostly "open source," allowing it to be freely downloaded by many developers without a license.

A notable success

It's notable that a single GitHub user successfully argued that, because Llama is essentially made up of copied parts and works, its specifications are not copyrightable.

However, this has not stopped Big Tech companies from arguing for more copyright protections for their AI models.

Several companies that submitted comments to the USCO, like Microsoft, OpenAI and even Apple, argue that outputs of their AI models and tools fall under copyright protection. That's even as lawyers for these companies insist that the billions of copyrighted inputs required to train the models cannot be similarly protected. (Meta and Google have so far not weighed in on whether AI model outputs should be protected or not).

Apple's support of copyrightable outputs from generative AI focused on the creation of computer program code, saying that when a human makes "decisions to modify, add to, enhance, or even reject suggested code," the final result should be protected by copyright law.