Last updated:  
Apr 10, 2018 @ 6:02 PM
testing speecht technology

Last weekend we did a guest post for VentureBeat comparing Siri, Cortana and Google Now in five different languages – English, French, Italian, German and Mandarin. We received a number of requests asking us about our methodology so we thought it might be useful to share our specific test cases and results here.

Our test cases consisted of eighteen questions that, from our experience, users would be likely to ask for from a personal voice assistant. All requests were executed through voice commands and were targeted to test the usefulness of the assistant (rather than accuracy in pure dictation or search). Defaulting to search for a query which could have generated a direct response (i.e. “call the nearest Chinese restaurant”) was penalized.

Query Results by Language

Scroll through the slideshow below to see detailed results for each language.

[dt_slideshow height=”800″ posts=”siri-cortana-google”]

Each question was scored on a 1.0, 0.5 and 0.0 scale. Questions that scored a full pass (1.0) were successfully executed by the voice assistant returning a direct response. In contrast, a 0.5 rating was given to questions that displayed the right answer in a web search query and 0.0 ratings were given to inaccurate answers or to questions which the voice assistant could not successfully execute. See below for an accuracy comparison across all five languages.

testing voice technology. cortana, google now and siri in foreign languages

Do you have questions, thoughts or ideas? Post in the comments below!

If you’re interested in learning more about localizing speech technology, check out how we collect multilingual data and conduct speech recognition testing.


data collection testing localization services