Last weekend we did a guest post for VentureBeat comparing Siri, Cortana and Google Now in five different languages – English, French, Italian, German and Mandarin. We received a number of requests asking us about our methodology so we thought it might be useful to share our specific test cases and results here.
Our test cases consisted of eighteen questions that, from our experience, users would be likely to ask for from a personal voice assistant. All requests were executed through voice commands and were targeted to test the usefulness of the assistant (rather than accuracy in pure dictation or search). Defaulting to search for a query which could have generated a direct response (i.e. “call the nearest Chinese restaurant”) was penalized.
Query Results by Language
Scroll through the slideshow below to see detailed results for each language.
Each question was scored on a 1.0, 0.5 and 0.0 scale. Questions that scored a full pass (1.0) were successfully executed by the voice assistant returning a direct response. In contrast, a 0.5 rating was given to questions that displayed the right answer in a web search query and 0.0 ratings were given to inaccurate answers or to questions which the voice assistant could not successfully execute. See below for an accuracy comparison across all five languages.