Over the last decade, voice recognition technology has made some major advances. Using speech technology ten years ago was more of a headache than the value that it was proposing to bring. As evidenced by the popularity of Siri, Google Voice and Dragon Assistant, speech technology has come a long way. Big companies like Apple, Google, Intel, Microsoft and Nuance are investing heavily in research and development. Each of them sees a future potential in voice recognition and they are all racing to become the industry leader. But, do you know who does speech recognition the best? Luckily for you, we’re here to go through a high-level speech recognition system comparison between the big players in the field. We’ll talk about some of the major breakthroughs, as well as take a sneak peak at what the future of voice recognition might look like.
Apple, owner of the well-known speech assistant Siri (rumored to be powered by Nuance), has hired its own team of researchers and developers to spur innovation and to improve its speech software. The company recently added a handsfree wake-up feature with its new IOS 8 release and has partnered with Shazam, an application that can recognize the music playing around you, to integrate the same capability within Siri.
The engineers at Google are making voice recognition multilingual by comprehending up to five languages without changing a single setting. “Google will automatically detect which language you’re using. (For now, you need to stick to one language per sentence though.) You can select up to five languages total—enough to satisfy all but the most advanced polyglots.” With more than half the world speaking in multiple languages, the ability to search and dictate interchangeably allows users to speak to their smartphones as they would naturally. We tested this feature here at Globalme and we think it works great.
Cortana, Microsoft’s virtual assistant, is the new player on the block and has had mixed reviews up until now. Microsoft has promised to update the software twice a month. Cortana’s latest update includes the capability to define words, giving Windows Phone users the ability to impress their friends with their new-found vocabulary. The software is also rumored to be launched with the new Windows 9 release.
Nuance’s Dragon Assistant and Dragon Naturally Speaking
Nuance is also striving to make technological breakthroughs. “I should just be able to talk to [my phone] without touching it,” says Vlad Sejnoha, chief technology officer at Nuance Communication. “It will constantly be listening for trigger words, and will just do it — pop up a calendar, or ready a text message, or a browser that’s navigated to where you want to go.”
What does Nuance predict for the future of speech recognition? “A further development is the addition of a deeper level of understanding,” says John West, principal solutions architect for Nuance. West claims, “Here, the aim is to not only recognize speech, but also to extract the meaning and intent of what has been said, enabling voice driven systems as a whole to react in an intelligent way, appropriate to the user’s needs.”
The implications of this kind of intelligence and integration will soon have a substantial, yet unobtrusive effect on our everyday lives. “If you are in your car, a voice driven personal assistant application might tell you fuel was running low,” West says. “You could ask it where the nearest fuel station was and it could know you have a preference for a particular brand, that you are heading north on a specific road and work out the closest station for you. Or perhaps it could warn you that it is too far to reach with the fuel remaining. When you tell the system your decision, it will ask if you require directions and provide them by interfacing with the car’s GPS facility.”
Beyond our Phones and Automobiles
Many specialists in the tech industry claim that the predominance of voice recognition will soon expand past smart phones and automobiles. Engineers are hard at work creating voice activated household appliances, speakers, and much, much more. “This next-generation ‘conversation’ technology offers consumers a way out of having to use the clunky remote control interface and, rather than having to learn how to talk to the device, allow them to speak and interact with their TVs as they would with another human being.”
The Complexity of Language
Arguably the biggest challenge facing the innovation of voice recognition is the necessary adaptation to an endless list of languages and dialects within those languages. Some companies are finding ways to overcome these challenges. In return, they are using it as a means to differentiate from their competitors (like Google Voice’s multiple-language search capability mentioned above). Other companies are finding it more difficult to make their software/features functional in all languages. Apple, for example, recently made headlines for not offering keyboard dictation and voice directions in Indian languages as part of its new IOS 8 release. In a country where Apple is aggressively trying to grow its 5% market share, the company may have missed the boat.
Nevertheless, companies have certainly recognized the market potential to innovate their speech technology into different languages and dialects. This is clearly visible in the localization industry, where we have seen a large influx in the number of localization projects related to speech technologies.
Voice recognition has come a long way over the last decade and it is clear that this isn’t the end of the road, likely not even the half-way stretch. The intense level of competition between these major technological players suggests that there is a lot more in store for voice recognition.
What would you like to see happen in the world of speech technology? Did you find this speech recognition system comparison useful? Post your comments below!