+

Cookies on the Business Insider India website

Business Insider India has updated its Privacy and Cookie policy. We use cookies to ensure that we give you the better experience on our website. If you continue without changing your settings, we\'ll assume that you are happy to receive all cookies on the Business Insider India website. However, you can change your cookie setting at any time by clicking on our Cookie Policy at any time. You can also see our Privacy Policy.

Close
HomeQuizzoneWhatsappShare Flash Reads
 

Adding one line of code can now prevent OpenAI from accessing a website's data to train ChatGPT

Aug 8, 2023, 17:04 IST
Business Insider
Just one line of code can now prevent OpenAI from accessing a website's data to train ChatGPT.Getty Images
  • OpenAI launched a new web crawler called GPTBot to browse the internet and collect information.
  • However, adding one line of code to a website will block the crawler from accessing the site's data.
Advertisement

Adding just one line of code to a website will now block OpenAI from using the site's data to train its AI models.

ChatGPT creator OpenAI launched a new web crawler — called GPTBot — along with instructions for how to block it, various publications, including the Search Engine Journal, reported Monday.

A web crawler is a bot that browses the internet to collect information. Search engines like Google use web crawlers to collect information for their search results, while AI companies use these crawlers to collect data to train their models.

OpenAI launched the bot and instructions to block the crawler by adding a line of code to a website's "robots.txt" file, according to a notice on the company's website. It is not immediately clear when the notice was posted.

Website owners can also selectively allow GPTBot access to specific pages on their sites, per OpenAI's post.

Advertisement

The company added in the post that GPTBot filters out those sources that require paywall access, are known to gather personally identifiable information or have text that violates company policies.

However, one professor thinks OpenAI's disclosure is less about individual privacy and more about appeasing large rights holders — such as media outlets and stock photo libraries.

That's as most of the sensitive information about individuals largely exists on websites where they cannot modify the code, Michael Veale, an associate professor of digital regulation at University College London, told Insider on Tuesday.

Before this post, OpenAI had not specified what data it used to train GPT-4 — the AI model behind ChatGPT — and whether it included social media posts and copyrighted works, the Verge and MIT Technology Review reported.

OpenAI's internet scraping has landed it in hot water with authors and artists.

Advertisement

Five authors filed two separate lawsuits against OpenAI, claiming the company violated copyright law by using their books to train its AI models. Separately, over 8,000 writers — including James Patterson and Margaret Atwood — signed an open letter demanding that OpenAI and other AI companies compensate them for the unauthorized use of their works.

OpenAI did not immediately respond to a request for comment from Insider, sent outside regular business hours.

You are subscribed to notifications!
Looks like you've blocked notifications!
Next Article