+

Cookies on the Business Insider India website

Business Insider India has updated its Privacy and Cookie policy. We use cookies to ensure that we give you the better experience on our website. If you continue without changing your settings, we\'ll assume that you are happy to receive all cookies on the Business Insider India website. However, you can change your cookie setting at any time by clicking on our Cookie Policy at any time. You can also see our Privacy Policy.

Close
HomeQuizzoneWhatsappShare Flash Reads
 

Microsoft built technology that's better than a human at understanding a conversation

Oct 18, 2016, 18:30 IST

The Microsoft Research team responsible for setting the new milestone. Back row, left to right: Wayne Xiong, Geoffrey Zweig, Frank Seide. Front row, Xuedong Huang, Dong Yu, Mike Seltzer, Jasha Droppo and Andreas Stolcke.Dan DeLong

In December 2015, Microsoft Chief Scientist of Speech Xuedong Huang told Business Insider that "in the next four to five years, computers will be as good as humans" at understanding the words that come out of your mouth.

Advertisement

Less than a year later, and Microsoft just set a record with the announcement of a system that can transcribe the contents of a phone call with "the same or fewer errors" than real actual human professionals trained in transcription.

It's a huge milestone for speech recognition, even as gadgets like Amazon Echo and Apple's Airpods prove that voice is going to play a big role in the future of technology. And by Huang's standard, that's mission accomplished.

"We were able to move more quickly than we anticipated" thanks to advancements in artificial intelligence and acoustic technology, Microsoft Principal Researcher Geoffrey Zweig tells Business Insider, and "we were able to get here faster."

Switchboard test

Back in the 1990's, the National Institute of Standards and Technology (NIST) released a whole bunch of recorded phone conversations in English, Spanish, and Mandarin, called "Switchboard," as a way to keep things fair for the field of speech recognition research. Everybody is working from the same data, so nobody can cheat.

Advertisement

Since then, lots of companies, including IBM, Google, and Microsoft itself, have used the Switchboard test as one of the main ways to check the accuracy of their speech recognition software.

A phone call is a great test because, as in real life, people mumble, mutter, cough, and otherwise stumble over their words, making automatic transcription a "much more difficult task" than it would be under laboratory conditions, Zweig says.

Microsoft Distinguished Engineer Xuedong HuangMicrosoft

Back in September, Huang announced via blog entry that Microsoft Research had achieved an error rate on the Switchboard test of 6.3%. He said Microsoft's error rate was believed to be the best in the whole industry, and only a hair above the 5.9% average error rate among professional transcribers.

So, Microsoft made some tweaks to the model, and did what Zweig says nobody had ever done before: Took the Switchboard test and gave it to those professionals to transcribe, to compare the results.

Advertisement

Why had nobody taken that step before? Maybe because it was "beyond the imagination" that even the best systems were even close to matching a human, Zweig speculates. Regardless, the results came back and NIST verified them.

Microsoft had officially built a speech recognition system that was better than a human.

What's next?

In the shorter term, this technology is going to make Microsoft's Cortana virtual assistant much better at understanding you. In the long term, Zweig says, Microsoft is working hard at using this successful model and then tweaking it for more situations.

Right now, it's optmized for listening in on a conversation on a nice, stable landline telephone. With the core speech recognition algorithms all stable, now they can tweak it to better understand you when you're on a noisy city street, or an echo-y conference room, or even using a McDonalds drive-thru.

And the more people use it in all these situations, the better it gets for everyone, Zweig says, as the algorithms learn and improve.

Advertisement

"This is a technology that's constantly improving," Zweig says.

YouTube/Microsoft

And in general, this science is a huge and important step forward as speech recognition becomes ever more important to the future of technology. With the ability to understand the words coming out of your mouth, it's a solid foundation on which to build better, smarter artificial intelligence that can find the context around the words.

"We've actually managed to advance the technology of speech recognition," says Zweig.

NOW WATCH: This virtual character can translate speech into sign language

Please enable Javascript to watch this video
You are subscribed to notifications!
Looks like you've blocked notifications!
Next Article