Home
tech
news
A lawsuit claims OpenAI stole 'massive amounts of personal data,' including medical records and information about children, to train ChatGPT

A lawsuit claims OpenAI stole 'massive amounts of personal data,' including medical records and information about children, to train ChatGPT

Grace Dean

Tech3 min read

A lawsuit alleges OpenAI stole personal data from "millions of Americans" to train ChatGPT.
The lawsuit alleges OpenAI crawled the web to amass huge amounts of data without people's permission.

OpenAI stole "massive amounts of personal data" to train ChatGPT, a lawsuit alleges.

The proposed class-action suit claims that Sam Altman's company "secretly" harvested data to train its large language models so that its chatbot could replicate human language.

"Despite established protocols for the purchase and use of personal information, Defendants took a different approach: theft," the lawyers wrote in the 157-page lawsuit, filed on Wednesday in the US District Court for the Northern District of California.

The lawsuit alleges that OpenAI crawled the web to amass huge amounts of data, including vast quantities taken from social-media sites. OpenAI's propertiatary AI corpus of personal data, WebText2, for example, scraped huge amounts of data from Reddit posts and the websites they linked to, the lawsuit claims.

The data accessed included "private information and private conversations, medical data, information about children — essentially every piece of data exchanged on the internet it could take — without notice to the owners or users of such data, much less with anyone's permission," per the lawsuit.

This amounted to "the negligent and otherwise illegal theft of personal data of millions of Americans who do not even use AI tools," the lawsuit claims.

OpenAI did not immediately respond to Insider's request for comment, made outside of regular working hours.

As well as scraping the "digital footprints" of the wider public, the lawsuit claims that OpenAI also stores and discloses users' private information, including the details they enter to create OpenAI accounts, their chat log data, and social media information.

Alongside people who use ChatGPT directly, this includes data from people using applications that have integrated ChatGPT, such as Snapchat, Stripe, Spotify, Microsoft Teams, and Slack, the lawsuit alleges. The companies did not immediately respond to Insider's request for comment.

The lawsuit is seeking a temporary freeze on commercial access to and commercial development of OpenAI's products until the company has implemented more regulations and safeguards, including allowing people to opt out of data collection and preventing its products from "surpassing human intelligence and harming others." The lawsuit also seeks financial compensation for people whose data was accessed to train the bots.

As well as OpenAI, major backer Microsoft was named as a defendant.

The plaintiffs were identified only by their initials, occupations, and state, which their lawyers said was to "avoid intrusive scrutiny as well as any potentially dangerous backlash."

Generative AI, which can create text, audio, images, and videos, has exploded in popularity since OpenAI released its ChatGPT in November. People have been using generative AI for personal, professional, and academic purposes, though there are concerns about its access to data.

Italy in March announced a temporary ban on access to ChatGPT over privacy concerns, claiming that there was no legal basis to justify "the mass collection and storage of personal data" used to train the algorithms behind ChatGPT. Some companies, including Amazon and Microsoft, have instructed employees not to enter confidential information into the chatbot. Samsung, meanwhile, has banned staff from generative AI tools.

Wednesday's lawsuit says that though AI platforms "undoubtedly have the potential to do much good in the world," they could also create a "potentially catastrophic risk to humanity."

As well as concerns that it could massively disrupt the jobs markets, AI has been known to spread false information and some people have used it for malicious purposes. OpenAI's creators have said that AI could surpass human expertise in most areas within the next 10 years and some critics fear the technology poses an existential risk.

"We face imminent and unreasonable risks of the very fabric of our society unraveling, at the hands of profit-driven, multibillion-dollar corporations," the lawsuit says.

"Powerful companies, armed with unparalleled and highly concentrated technological capabilities, have recklessly raced to release AI technology with disregard for the catastrophic risk to humanity in the name of 'technological advancement,'" the lawsuit says.

A lawsuit claims OpenAI stole 'massive amounts of personal data,' including medical records and information about children, to train ChatGPT

Grace Dean

Popular Right Now

Popular Keywords

Buying Guides