This year, the most exciting trend in voice assistant software is actually the proliferation of the hardware that will run it. Far from being limited to Amazon’s Alexa, Google’s Home, and Apple’s HomePod, modern AI voice assistants are now working their way into everything from cars to fridges to wrist-watches. As the market expands at an incredible rate, a race is developing between software giants determined to become the default voice assistant for smart devices all over the world.
To a large extent, the winner will be determined by the outcome of the ongoing battle for ownership of the new smart speaker space. These centralized hubs are intended to act as expansion points for the so-called “smart home,” and they will naturally integrate more easily with devices that run the same voice control AI. Consumers are currently deciding the future of the smart home, and of voice assistants, through these early-stage speaker purchases.
Importance of Language Support
A cursory look at the feedback for these flagship products, both the professional and user-generated reviews, shows precisely what the most important differentiators are, when buying. Beyond pure brand loyalty, consumers universally claim that their reaction to a particular smart home product derives from the speed and accuracy of its understanding of their speech, and the naturalism of the voice it uses to respond.
Across languages, dialects, and accents, consumers want a voice assistant that both understands them and speaks to them understandably. The company or companies that continue to invest the most proactively in the development of wide language and accent support will pull ahead in the minds of consumers, cementing their place as the backbone of the future smart home. Once that’s done, it will be much more difficult for new challengers to enter the market without that enormous base of linguistic information already in hand.
Alexa’s Language Support
Right now Amazon’s Alexa supports three overall languages: English, German, and Japanese. Within the English language, Alexa offers specific support for five dialects/accents: Australia, Canada, India, UK and US. This represents a huge proportion of the current market for smart devices, but it also leaves significant room for expansion in the future. Amazon has clearly put a lot of work into language support in other forms, such as Amazon Polly, which reads text aloud in 24 languages, but the company will have to continue to invest in bringing that research to bear on the home assistant market.
Google Home’s Language Support
Google’s Home currently supports four overall languages: English, French, German, and Japanese. It also supports four English dialects, from Australia, Canada, the UK and the US. However, Google is one of the world’s most energetic spenders when it comes to software development, and it was recently revealed that the company plans to have more than 30 languages enabled by the end of 2018. That’s precisely what Google needs to be doing, to ensure that it will remain competitive, going forward. It’s also the only thing that will allow Google to compete with Apple’s considerable lead with Siri’s robust language support.
Apple HomePod & Siri’s Language Support
Right now, Apple’s Siri voice assistant supports 20 languages: Arabic, Chinese, Danish, Dutch, English, Finnish, French, German, Hebrew, Italian, Japanese, Korean, Malay, Norwegian, Portuguese, Russian, Spanish, Swedish, Thai, and Turkish. It also supports a variety of dialects for Chinese, Dutch, English, French, German, Italian and Spanish. That’s an impressive lead, but with Google investing heavily it won’t be enough unless Apple continues to push forward. HomePod launched with English support only, but Siri’s full language abilities shouldn’t take long to apply through HomePod, as well.
Why Teaching Voice Assistants a New Language is Hard
Right now, Siri’s clear lead in language support is driving success in markets that Apple’s competitors simply cannot touch — but Google is close on Apple’s tail. Meanwhile, Business Insider believes that Alexa’s relative lack of language options is “significantly limiting its global reach.” With language so clearly at the heart of the fight for living-room dominance, the question becomes why don’t more companies develop wider and more robust language abilities. The clearest answer, it turns out, actually came last year, from Google itself.
Automatic Speech Recognition & Natural Language Understanding
Google has actually been developing the basic databases for expanded language support for years, thanks in large part to the background data gathering done by its flagship Search product. This is why Google may very well overtake Apple’s enormous lead in 2018: it has already put in the work of creating an enormous backlog of more than 115 languages capable of automatic speech recognition (ASR) through Google Home. With this ability in hand, localization of Home can often proceed with half the normal amount of effort, since the voice assistant only needs to build an understanding of the intent behind the transcribed sentence. This ability is called natural language understanding, or NLU, and it is what allows expansion of UK English support to Canada without too many problems arising.
Even an assistant AI that can accurately hear and transcribe words must still go through the labor-intensive localization process of NLU — that is, Spanish may be Spanish in Spain, Mexico, and the United States, but each dialect also requires a distinct linguistic model.
Collecting the Speech Training Data
This guide shows the many steps in localizing voice-active technology and wearables. The in-depth process involves everything from demographic research to speech data collection and annotation as well as testing the technology in multiple languages. Every step in this process is crucial to getting high-quality data and ensuring the voice assistant will work well on new markets; Globalme’s own research shows that while certain attributes like vocal gender vary mostly by user, they also show demographic differences between markets. A male voice may play better in areas that desire a more authoritative sound, for example.
NLU is where the intent behind an utterance is truly found, however; speech data needs to be specific to local usages of words, grammar, and colloquial expressions. Accurate transcription and annotation are essential in all to building a well working product.
An Outlook on the Future
The voice assistant market will continue to grow, reaching as much as seven times its current value by 2025 — and that means that even mega-corporations can’t assume they’ll be the only ones working to control the market. New challengers certainly have an uphill battle to catch up to the existing capabilities of voice assistants like Home, Siri, and Alexa, but tech startups have shocked the world with incredible advancements several times before. Alibaba is already making moves in the Chinese market, for instance, while in the West the Open Source AI Voice Assistant project has real potential to destabilize service providers that get too complacent. Lenovo seems to have chosen Alexa to power its low-cost assistant products — but there’s no telling who will win the larger war for consumer voice control.
In the end, early dominance in this emerging market will likely be the key to long-term success, and that early momentum will arise mostly from a reputation for easy, naturalistic interaction.