OpenAI's tools are making up stuff in hospital transcriptions; can be dangerous for patients, experts warn

BI India BureauOct 28, 2024, 14:18 IST

Business Insider India

Artificial intelligence has its hands in nearly every industry, from chatbots to healthcare, and one of the AI world's favorite new tools is Whisper, a transcription program developed by OpenAI. Whisper’s main selling point is that it supposedly delivers near “human-level robustness and accuracy,” as the company claims. But there’s a major catch — Whisper has a habit of “hallucinating.”

If you’re picturing an AI tool hallucinating psychedelic visuals, it’s not quite like that. Hallucinations in AI transcriptions mean the tool has a tendency to make up words, phrases, and even entire sentences. Whisper has been known to slip in racial commentary, violent ideas, or medical jargon that no one actually said. And when used in sensitive settings like hospitals, these “creative” additions can be more than an oddity — they could be downright dangerous.

The hospital risk factor

Several developers and researchers have found these hallucinations are worryingly frequent. In hospitals, transcription software like Whisper is being trialed to record patient-doctor conversations. It seems like a great idea — automated transcriptions could give healthcare professionals more time to focus on patients instead of endless paperwork. But as Allison Koenecke, a professor at Cornell, points out, nearly 40% of these hallucinations could lead to misinterpretations. And, as Alondra Nelson, former director at the White House Office of Science and Technology Policy, said, “Nobody wants a misdiagnosis. There should be a higher bar.”

In medical transcriptions, even one mistaken word can be serious. One version of Whisper, for example, took the phrase, “He, the boy, was going to, I’m not sure exactly, take the umbrella,” and turned it into something alarming: “He took a big piece of a cross, a teeny, small piece… I’m sure he didn’t have a terror knife so he killed a number of people.” Disturbing additions like these could leave hospital staff baffled or, worse, could result in patient notes that entirely misrepresent a consultation.

Scientists find speech-to-text AI randomly adding violent language such as “terror,” “knife” and “killed” to audio transcription

Researchers have analysed thousands of audio samples and found that hallucinations often appear in well-recorded clips, not just in low-quality recordings or when there’s lots of background noise. For instance, when someone said, “After she got the telephone, he began to pray,” Whisper turned it into “I feel like I’m going to fall, I feel like I’m going to fall…” In other cases, it has fabricated nonexistent medications, like “hyperactivated antibiotics,” or injected racial commentary where none existed.

Hospitals aren’t the only ones using Whisper — its transcriptions are popping up in call centers, video captions, and consumer voice assistants. In each case, hallucinations could lead to confusion and miscommunication, particularly for those who rely on accuracy, such as the Deaf and hard of hearing community. Whisper can bury made-up text in long transcriptions, meaning someone reviewing the text may struggle to spot which parts are hallucinated, experts warn.

Legal and ethical backlash

With hospitals and health centers pushing forward with Whisper tools, former OpenAI engineer William Saunders believes OpenAI should address this flaw urgently. He points out, “It’s problematic if you put this out there and people are overconfident about what it can do and integrate it into all these other systems.”

As usage grows, so do privacy concerns. California State Assembly member Rebecca Bauer-Kahan, who recently refused to allow a health provider to share her child’s medical recordings with OpenAI’s biggest investor, Microsoft Azure, expressed frustration about tech companies handling sensitive data. "The release was very specific that for-profit companies would have the right to have this,” Bauer-Kahan said. “I was like, ‘absolutely not.’”

"Jarvis, you there?": Google's new AI agent might take over your web browser and carry out tasks on your behalf

Despite the troubling feedback, OpenAI has been receptive to researchers’ findings. The company is working to address the hallucination issue, but they’ve also issued a warning: Whisper should not be used in “high-risk domains.” Yet, companies like Nabla, which offers Whisper-based tools to transcribe medical appointments, continue to integrate it. Nabla’s approach erases the original audio for “data safety reasons,” meaning hospitals may have no way to verify transcripts.

Whisper’s hallucinations are a stark reminder that AI in healthcare isn’t foolproof. The conversation about AI transcription flaws is picking up steam, with experts pushing for regulations and OpenAI encouraged to prioritise fixes. While AI is undeniably changing industries, sometimes its creative input needs a reality check — especially in life-and-death fields like medicine.

Cookies on the Business Insider India website

OpenAI's tools are making up stuff in hospital transcriptions; can be dangerous for patients, experts warn

The hospital risk factor

Scientists find speech-to-text AI randomly adding violent language such as “terror,” “knife” and “killed” to audio transcription

Legal and ethical backlash

"Jarvis, you there?": Google's new AI agent might take over your web browser and carry out tasks on your behalf