Home
tech
news
OpenAI offers a way for creators to opt out of AI training data. It's so onerous that one artist called it 'enraging.'

OpenAI offers a way for creators to opt out of AI training data. It's so onerous that one artist called it 'enraging.'

Kali Hays

Artists and image owners can now ask OpenAI to remove their images from DALL-E training data.
However, the process places a huge burden on creators to extract their own work.

OpenAI for the first time is letting artists remove their work from training data used for DALL-E 3, the latest version of its AI image generator. The opt-out process is so onerous that it almost seems like it was designed not to work.

OpenAI recently unveiled a new form that image owners and creators can use to request that owned or copyrighted images be removed from DALL-E training data.

AI models need high quality, and human generated training data to perform well. There's a race to accumulate all this information. But the original creators of this content have now realized that the value and intelligence embedded in their work is being ingested and processed for someone else's benefit. That's putting pressure on big tech companies to offer ways for creators to either actively decide to take part, or extricate their data from this grand AI experiment.

One by one

To have an opt-out request even be considered by OpenAI's new process, an artist, owner or rights holder has to submit an individual copy of each image they'd like removed from DALL-E's training dataset, along with a description.

For most artists, that could mean hundreds or thousands of works that need to be submitted one by one. The Georgia O'Keeffe Museum, for example, as the holder of rights to that artist's work, would need to submit individual requests for each of O'Keefe's more than 2,000 artworks to have them considered for removal from DALL-E's dataset.

OpenAI is full of very smart technologists. The company could have rolled out a process through which an artist or owner could make a single request that all of their work be removed from the training data. But the company did not do this. Why? Probably because it needs as much data as possible to build its AI models.

"Enraging"

Toby Bartlett, an artist with a namesake consulting firm, wrote on Threads that OpenAI's DALL-E opt-out process is "enraging."

"Now artists are going to have to almost ruin their work with watermarks of epic proportions in the hopes that their work doesn't get used… if that even works!" he added.

Greg Madhere, an IT consultant, also wrote on Threads that he's recently been getting into photography and wanted to share his images online. He's hesitant now, given the degree to which online content is being scraped and used to train AI models like DALL-E and ChatGPT.

"Where is it safe to even post online anymore?" he asked.

Too late

Even if OpenAI grants an artist or owner's opt-out request, it will only apply to "future" training data for DALL-E. The version 3 that was just released will have already made use of artistic work that a person requests be removed from its training data. Or, as OpenAI put it, its model will have "learned from their training data" and be able to "retain the concepts that they learned."

Translation: Here's the opt-out process, but it's too late because we've already sucked out most of the value from your work.

Several issues surrounding the use of copyrighted works for AI training are currently part of a rule-making process at the US Copyright Office, including opt-outs.

"We've heard from artists and creative content owners that they don't always want their content to be used for training and so we're offering them the ability to opt out their images from future training of models," an OpenAI spokesperson said.

Robots.txt option

For those with large bodies of work or "high volume of images from specific URLs," the company suggests blocking OpenAI's web crawler GPTBot by deploying robots.txt. OpenAI said last month that it would respect the decades-old method of websites signaling they do not want to have their data scraped by a web crawler.

The trouble is, for an artist or owner to deploy robots.txt, not only would they need to know every website that hosts their images, they would need access to those websites' codebases to add a robots.txt file that could block GPTBot.

Without such access, it's likely impossible that an artist or owner will be able to have their works removed from DALL-E training data at all.

Are you a tech employee or someone else with insight to share? Contact Kali Hays at khays@insider.com, on secure messaging app Signal at 949-280-0267, or through Twitter DM at @hayskali. Reach out using a non-work device.