+

Cookies on the Business Insider India website

Business Insider India has updated its Privacy and Cookie policy. We use cookies to ensure that we give you the better experience on our website. If you continue without changing your settings, we\'ll assume that you are happy to receive all cookies on the Business Insider India website. However, you can change your cookie setting at any time by clicking on our Cookie Policy at any time. You can also see our Privacy Policy.

Close
HomeQuizzoneWhatsappShare Flash Reads
 

Tumblr is selling user data to train AI. Things could get weird.

Feb 28, 2024, 17:25 IST
Business Insider
Tumblr will provide data from its users to help train AI models.SOPA Images
  • Tumblr's parent company is making a deal with OpenAI and Midjourney to train AI on Tumblr posts.
  • There will be an opt-out option for users who don't want their content being used for training.
Advertisement

404 Media reported that Auttomatic, the company that owns WordPress and Tumblr, is making a deal to provide data from their sites to help train OpenAI and Midjourney.

A representative for Auttomatic pointed me to a public blog post after the 404 Media article ran when I asked for comment. The blog post says that Auttomatic's sites currently block AI crawlers, but when they start sharing data with the AI companies, they'll offer an opt-out from doing so in the future.

"We are also working directly with select AI companies as long as their plans align with what our community cares about: attribution, opt-outs, and control," the blog post says. "Our partnerships will respect all opt-out settings."

Complimentary Tech Event
Transform talent with learning that works
Capability development is critical for businesses who want to push the envelope of innovation.Discover how business leaders are strategizing around building talent capabilities and empowering employee transformation.Know More

404 Media's report included internal Auttomatic employee messages describing how engineers were tasked with compiling posts from 2014 to 2023, but had made some mistakes, according to 404's reporting. The employees included posts from deleted or suspended blogs, private posts on public blogs, and private answers from the "Ask" function, the report said.

Most notably, they also included content marked NSFW or "mature," even though they weren't supposed to include those. Tumblr banned pornography and nudity in 2018, but in 2022 it loosened those rules to allow nudity (but still not sexually explicit images). It's worth reading 404's story on what Auttomatic is or isn't doing about these apparent errors.

Advertisement

ChatGPT will be introduced to fanfic

Meanwhile, anyone who has spent any time on Tumblr knows that there is a beautiful cornucopia of weird and niche stuff — especially among fandoms. So now ChatGPT will be able to write even better Fawnlock fanfic. (Yes, that's a version of Sherlock Holmes fanfiction where Sherlock and Watson are part deer.) Progress?

Tumblr is not the only social platform that is making deals like this. Reddit has a $60 million-a-year deal to license its data to Google to train its AI. Facebook and Instagram, of course, are already using data for Meta's own internal AI tools.

This can be controversial for some users, who feel uncomfortable about their content — on Tumblr, this is often personal writing or photography or art — being used to train AI.

Business Insider, through its parent company, also has a deal with OpenAI to use our news coverage in training AI. But that's a little different — I'm getting paid to write this, after all.

When platforms with user-generated content are selling that content to train AI, it feels, well, understandably weird.

Advertisement

I suppose one upside for this is knowing that Midjourney is going to be exposed to a lot more drawings of Sonic and Tails kissing.

Next Article