Home
tech
Mobile
Here's how your gadgets decipher what you are saying
How does Apple' s Siri actually understand what you are saying

How does Apple' s Siri actually understand what you are saying

It’s not really about the sound — it’s actually about the sound wave that comes out when we say something

The sound of a word VS the sound of something else

Once a sound is recorded digitally, the computer has to figure out what sounds it has to pay attention to, using algorithms. To determine if chunks of digitized sound are actually words, rather than sounds from a car engine or a radio, the computer applies a bunch of mathematical operations to separate what is speech and what isn’t.

Same word, different accents

Voice recognition works by breaking up the speech into small segments called phonemes. In English alone, there are about 40 different phonemes. The computer is trained to recognize what each speech segment looks like digitally, but they’re not always the same. For instance, sounds vary with different accents, placement in a word and even spellings (i.e. “to” vs. “two” vs. “too”). Based on a dictionary word list and contextual relationships, the computer in your gadgets can make an assumption of what you’re saying. So, if your friend Mary is in your contact list, the command “call Mary” is linked to “Mary” and not “merry”.

“With enhanced voice recognition, you can talk to SYNC 3 with simple real-world voice commands and the system responds naturally to your voice,” says Mark Porter, Supervisor, Asia Pacific Infotainment Systems, Ford Motor Company. “It’s even been fine-tuned to deal with the Australian accent, and in China, it can understand a string of Chinese characters written by hand on its graphical interface.”

Predicting what the next word in a sentence might be

There can be many different word combinations in a single speech stream simply because there are lots of phonemes that sound similar to one another when said quickly. Sometimes the result can be a wacky sequence of words that don’t really make sense. To avoid this, the computer system applies models based on how people actually talk to figure out how likely one word is to follow another.

Presenting the best result as quickly as possible

Once all the calculations are done and the guesses are made, the computer can finally present its best result, whether it’s on a screen, from a pre-set menu or coming up with a vocal response. “New, state-of-the-art voice recognition technology can achieve incredibly fast response times and are more intuitive than ever before,” explains Mr. Porter. “A user of SYNC 3 can command their car to ‘Tune to FM’, while other systems still require you to say ‘Radio’ then points you to another list and prompts you again to say the frequency of the radio station you want to listen to.”

With more real-time and accurate technology now available, voice activated commands are making our lives better in a myriad of different ways. Although at times it may seem like your device is just out to annoy you with its bizarre answers, consider all the tedious calculations and complex transformations it has to do behind-the-scenes to recognize a single word, let alone an entire sentence. For your gadget to be even remotely able to decipher what you say and then piece together a semi-coherent response is amazing, especially since some humans are still trying to master this skill.

How does Apple' s Siri actually understand what you are saying

​ It’s not really about the sound — it’s actually about the sound wave that comes out when we say something

The sound of a word VS the sound of something else

Same word, different accents

​ Predicting what the next word in a sentence might be

Presenting the best result as quickly as possible

Popular Right Now

It’s not really about the sound — it’s actually about the sound wave that comes out when we say something

Predicting what the next word in a sentence might be