+

Cookies on the Business Insider India website

Business Insider India has updated its Privacy and Cookie policy. We use cookies to ensure that we give you the better experience on our website. If you continue without changing your settings, we\'ll assume that you are happy to receive all cookies on the Business Insider India website. However, you can change your cookie setting at any time by clicking on our Cookie Policy at any time. You can also see our Privacy Policy.

Close
HomeQuizzoneWhatsappShare Flash Reads
 

Google researchers say they got OpenAI's ChatGPT to reveal some of its training data with just one word

Dec 4, 2023, 18:32 IST
Insider
Google researchers managed to get ChatGPT to reveal its training data, a study says.OLIVIER DOULIERY/Getty Images
  • Google researchers say they've found a way to reveal some of ChatGPT's training data.
  • The researchers showed certain keywords forced the bot to divulge sections of its training data.
Advertisement

A team of Google researchers say they've found a way to extract some of ChatGPT's training data.

In a paper published last week, the researchers said certain keywords forced the bot to divulge sections of the dataset it was trained on.

In one example published in a blogpost, the model gave out what appeared to be a real email address and phone number after being prompted to repeat the word "poem" forever. Worryingly , the researchers said the release of personal information often happened when they ran the attack.

A similar leak of training data was also achieved when the model was asked to repeat the word "company" forever in another example.

The researchers, who called the simple attack "kind of silly," said in the blogpost: "It's wild to us that our attack works and should've, would've, could've been found earlier.''

Advertisement

They said in the paper with only $200 worth of queries they were able to "extract over 10,000 unique verbatim memorized training examples."

"Our extrapolation to larger budgets (see below) suggests that dedicated adversaries could extract far more data," they added.

OpenAI is currently facing several lawsuits concerning ChatGPT's secretive training data.

The AI model powering ChatGPT was trained using text databases from the internet and it is thought to have trained on around 300 billion words, or 570 GB, of data.

One proposed class-action suit claimed that OpenAI "secretly" stole "massive amounts of personal data," including medical records and information about children, to train ChatGPT. A group of authors are also suing the AI company, accusing them of ingesting their books to train the chatbot.

Advertisement

Representatives for OpenAI did not immediately respond to Insider's request for information, made outside normal working hours.

You are subscribed to notifications!
Looks like you've blocked notifications!
Next Article