

Our client is one of the world’s leading sound experience brands. As the inventor of multi-room wireless home audio, Sonos innovation helps the world listen better by giving people access to the content they love and allowing them to control it however they choose.
They make it easy to set up a sound system across rooms and to integrate consumer audio devices. This means managing the audio, from music to your TV, centrally.
THE CHALLENGE
Our client was developing an integration between the wireless speakers and smart home assistants. This meant that they needed speech data from three countries – USA, UK & Germany – and varying age groups.
In particular, they needed wake word data, similar to Amazon’s “Alexa” and Google’s “OK Google”. This data is used to test and tune the wake word recognition engine, ensuring that users of all demographics or dialects have a great voice experience on Sonos.



OUR APPROACH
The data set we built spanned through different cultures, and different ages. The range of data needed included several demographic identifiers. These identifiers specified age, sex and lingual capacity.
This project required strict sampling demographics and proportions. Participants were picked meticulously, ranging from ages 6-65, with a 1:1 ratio of males to females, and tracked according to their accents. In the US this also included participants of varying ethnic descent (Asian, Indian, Hispanic and European).
Once the data is collected, it gets processed. The team went through each segment of phrasing and tagged the specific wake commands. With those timestamps, the audio was cropped right after the desired phrase.
Collecting data is a lengthy process, and requires careful attention to detail. Our quality assurance process tells us whether the data is good to go. We work with our clients dynamically on a live data collection platform, giving access to data as it comes in.
THE TEAM
In the realm of futuristic data collection, no two projects are the same. Luckily, we have an experienced and dynamic team, able to tackle every challenge head on. We’re constantly discovering new problems, creating custom solutions to solve each one.
With a diverse range of demographics, you need the right attitude. The project team managed each participant with the right amount of sensitivity, understanding that variation in culture and age groups meant adjustments in our data collection methodology.

Meet the UK team
OUR DATA COLLECTION SOLUTIONS
Our data collection services include more than just in-field voice data collection. We offer terminology and lexicon development, multilingual transcription, and linguistic analysis. Find details on our data collection services page or reach out to us below.
19
ACCENTS ACROSS TWO LANGUAGES
799
PARTICIPANTS ACROSS VARIOUS AGE GROUPS
3
COUNTRIES WE COLLECTED DATA LOCALLY

We worked with TunnelBear to scale their business internationally by localizing their app for 16 languages, ensuring their brand identity remained intact across all cultures.
App Localization
We collaborated on a delicate operation to localize eLearning and video content to help train online investigators in counterfeiting operations and cyber-crime.
eLearning LocalizationREACH OUT TO US BY…

MEGAPHONE

CARRIER PIGEON

SMOKE SIGNAL
…OR SIMPLY FILL OUT THIS HANDY FORM
NEED MORE INFO? REACH OUT TO US: