- A firm cut ties with OpenAI over concerns around working with potentially illegal content for AI-training purposes, Time reports.
- Kenyan workers were reportedly paid up to $2 an hour to label explicit content used to train ChatGPT.
For months, San Francisco-based AI firm Sama worked with OpenAI, the company behind the buzzy conversational AI ChatGPT, to identity and label sensitive images and text — data that is later used to train ChatGPT so it can spit out impressive responses free of toxicity, Time reported in an investigation.
But in February of 2022, Sama ended its partnership with OpenAI after it discovered that OpenAI allegedly requested and received 1,400 images worth of potentially illegal content that included child sexual abuse, bestiality, rape, and other forms of violence for an AI-training project unrelated to ChatGPT, according to internal documents Time reviewed.
OpenAI confirmed that it used Kenyan workers to help build out a tool that tags problematic content, according to a statement to Time.
Essentially, in order to train AI to recognize and remove horrific content, a labeled database of horrific content was required, and that's part of what Sama's contractors were tasked with working on.
Under Sama's contracts, data labelers were outsourced from Kenya and were tasked to label text in their respective teams such as sexual abuse, hate speech, and violence, according to internal documents Time obtained. Depending on their seniority and productivity levels, employees were paid between $1.32 to $2 an hour to scour through troves of graphic content, according to four Sama employees that spoke to Time under anonymity.
OpenAI and Sama did not respond to Insider's request for comment ahead of publication.
"Our mission is to ensure artificial general intelligence benefits all of humanity, and we work hard to build safe and useful AI systems that limit bias and harmful content," OpenAI said in a Time statement. "Classifying and filtering harmful [text and images] is a necessary step in minimizing the amount of violent and sexual content included in training data and creating tools that can detect harmful content."
Still, the nature of the work has caused severe distress for some data labelers, according to the report. One employee called his work "torture" after he was assigned to read an excerpt about a man engaging in a sexual act with a dog with a child present — an experience so traumatic that it gave him recurring visions, he told Time.
In rare occasions, some data labelers said they weren't provided with clear guidelines on how to categorize the content they review, Time reports. One was reportedly tasked with reading a raunchy story where Batman's sidekick, Robin, gets raped, and wasn't sure whether to label it as sexual violence because Robin ended up reciprocating sexual acts.
Sama told Time that it provides one-on-one mental health counseling and wellness programs for employees to de-stress.
Contract workers have longed complained of the mental toll of ridding tech systems of toxic content
The Time investigation's findings come as many companies that have adapted AI technology to improve their services and business processes continue to outsource low-wage employees in content moderation work from outside the US, with some contractors reporting negative impacts to their physical or mental health.
Companies like Amazon, for example, have hired video reviewers in India and Costa Rica to watch thousands of videos, resulting in physical ailments like headaches and eye pain, the Verge reported. In 2019, after some Facebook contractors said they suffered from PTSD from moderation work, CEO Mark Zuckerberg called the reports of the complaints "a little overdramatic."
Almost a year after the fallout with OpenAI, Sama — which has also offered data labelling services to Google and Microsoft — told Time that it will be putting an end to all work dealing with graphic content by March of 2023, including a $3.9 million contract with Facebook.
"After numerous discussions with our global team, Sama made the strategic decision to exit all [natural language processing and content moderation work to focus on computer vision data annotation solutions," Sama said in its statement."