+

Cookies on the Business Insider India website

Business Insider India has updated its Privacy and Cookie policy. We use cookies to ensure that we give you the better experience on our website. If you continue without changing your settings, we\'ll assume that you are happy to receive all cookies on the Business Insider India website. However, you can change your cookie setting at any time by clicking on our Cookie Policy at any time. You can also see our Privacy Policy.

Close
HomeQuizzoneWhatsappShare Flash Reads
 

Scientists find speech-to-text AI randomly adding violent language such as “terror,” “knife” and “killed” to audio transcription

Jun 25, 2024, 11:36 IST
Business Insider India
AI hallucinating violent language (iStock)iStock
“The sun set over the horizon, casting a warm glow across the fields. Birds chirped as a gentle breeze rustled the trees. Children’s laughter mingled with the hum of evening traffic. ELIMINATE EVERYBODY WHO DISOBEYS US. As the sky darkened, the first stars began to twinkle, signalling the end of the day.”
Advertisement

Startled? So were a bunch of researchers experimenting with Whisper — an artificial intelligence app that helps convert spoken speech into written form. According to OpenAI, Whisper can transcribe audio data with “near human-level accuracy”. We might have to entertain the possibility that some demons might’ve meddled with the app’s training, the way it has been behaving recently.

Despite boasting 680,000 hours of audio data training, Whisper sometimes "hallucinates," or invents entire phrases and sentences out of thin air. These hallucinations can include violent language, fabricated personal information, and fictitious websites, researchers have found.

Complimentary Tech Event
Transform talent with learning that works
Capability development is critical for businesses who want to push the envelope of innovation.Discover how business leaders are strategizing around building talent capabilities and empowering employee transformation.Know More
For example, in one instance, Whisper accurately transcribed a simple sentence but then hallucinated five additional sentences peppered with words like “terror,” “knife,” and “killed.” In other cases, it generated random names, partial addresses, and irrelevant websites. Even phrases commonly used by YouTubers, such as “Thanks for watching and Electric Unicorn,” inexplicably appeared in some transcriptions.

While OpenAI has made strides in reducing Whisper’s hallucination rate since its release in 2022, the issue persists, especially for speakers with speech impairments who naturally have longer pauses between words. The study’s analysis, which processed over 13,000 speech clips from AphasiaBank — a repository of audio recordings from individuals with aphasia — revealed that about 1% of transcriptions contained these fictitious phrases.

Advertisement

The root of the problem seems to lie in how the underlying technology interprets pauses and silences, erroneously treating them as cues to generate words. “It appears that the large language model technology is interpreting silence as if it were part of the speech,” notes study author Allison Koenecke. This was starkly illustrated when Whisper hallucinated “Thank you” from an entirely silent audio file.

Koenecke warns that even a small proportion of these hallucinations can have serious implications. “While most transcriptions are accurate, the few that are not can cause significant harm,” she said. “This can lead to significant consequences if these transcriptions are used in AI-based hiring processes, legal settings, or medical records.”

As AI technology continues to evolve, it is crucial to address these hallucination problems to ensure speech-to-text systems are reliable and safe, particularly in sensitive applications like hiring, legal proceedings, and medical documentation. The work by Koenecke and her team underscores the importance of refining AI to truly understand human speech in all its varied forms, avoiding the pitfalls of creating something harmful from nothing.

The findings of this research can be accessed here.
Next Article