IBM speech recognition is on the verge of super-human accuracy
Companies that can create software with error rates falling in that ballpark are essentially matching the capabilities of humans, who miss roughly 5% of the words in a given conversation.
On March 7, IBM announced it had become the first to home in on that benchmark, having achieved a rate of 5.5%. The breakthrough signals a big win for artificial intelligence that could eventually live in smartphones and voice assistants like Siri, Alexa, and Google Assistant.
"The ability to recognize speech as well as humans do is a continuing challenge, since human speech, especially during spontaneous conversation, is extremely complex," Julia Hirschberg, a professor of computer science at Columbia University, told IBM in a statement.
Over the last year, IBM has worked to break its former record of 6.9%. In order to cut the error rate by nearly 1.5 percentage points, the company fine-tuned aspects of its acoustics, which pick up different forms of speech.
Though experts like Hirschberg say machines still can't pick up certain nuances of speech, such as tone and metaphor, software has made considerable advances in rote transcription. And the tests aren't feeding machines softballs: In the latest assessment, software had to discern what humans were saying in everyday contexts, such as buying a car, which were littered with stutters, ums, and mumbling.
IBM says the 5.5% claim to fame is especially important in an industry that often can't agree what humans are capable of.
"Others in the industry are chasing this milestone alongside us, and some have recently claimed reaching 5.9 percent as equivalent to human parity," wrote IBM research scientist George Saon.
In 2016, researchers from Microsoft announced they had built a computer that could actually beat humans at understanding conversation. The software had an error rate of 6.3%, well above IBM's new record.
But given the 5.1% goal IBM has set for itself, Saon continued, "we're not popping the champagne yet."