Dialect and Accent Speech Data Collection

Dialect and Accent Speech Data sonos
sonos logo

Our client is one of the world’s leading sound experience brands. As the inventor of multi-room wireless home audio, Sonos innovation helps the world listen better by giving people access to the content they love and allowing them to control it however they choose.

They make it easy to set up a sound system across rooms and to integrate consumer audio devices. This means managing the audio, from music to your TV, centrally.

THE CHALLENGE

Our client was developing an integration between the wireless speakers and smart home assistants. This meant that they needed speech data from three countries – USA, UK & Germany – and varying age groups.

In particular, they needed wake word data, similar to Amazon’s “Alexa” and Google’s “OK Google”. This data is used to test and tune the wake word recognition engine, ensuring that users of all demographics or dialects have a great voice experience on Sonos.

Dialect and Accent Speech Data chart UK
Dialect and Accent Speech Data chart US
Dialect and Accent Speech Data chart DE

OUR APPROACH

The data set we built spanned through different cultures, and different ages. The range of data needed included several demographic identifiers. These identifiers specified age, sex and lingual capacity.

This project required strict sampling demographics and proportions. Participants were picked meticulously, ranging from ages 6-65, with a 1:1 ratio of males to females, and tracked according to their accents. In the US this also included participants of varying ethnic descent (Asian, Indian, Hispanic and European).

Dialect and Accent Speech Data chart age groups

Once the data is collected, it gets processed. The team went through each segment of phrasing and tagged the specific wake commands. With those timestamps, the audio was cropped right after the desired phrase.

Collecting data is a lengthy process, and requires careful attention to detail. Our quality assurance process tells us whether the data is good to go. We work with our clients dynamically on a live data collection platform, giving access to data as it comes in.

THE TEAM

In the realm of futuristic data collection, no two projects are the same. Luckily, we have an experienced and dynamic team, able to tackle every challenge head on. We’re constantly discovering new problems, creating custom solutions to solve each one.

With a diverse range of demographics, you need the right attitude. The project team managed each participant with the right amount of sensitivity, understanding that variation in culture and age groups meant adjustments in our data collection methodology.

Dialect and Accent Speech Data team

Meet the UK team

OUR DATA COLLECTION SOLUTIONS

Our data collection services include more than just in-field voice data collection. We offer terminology and lexicon development, multilingual transcription, and linguistic analysis. Find details on our data collection services page or reach out to us below.

19

ACCENTS ACROSS TWO LANGUAGES

799

PARTICIPANTS ACROSS VARIOUS AGE GROUPS

3

COUNTRIES WE COLLECTED DATA LOCALLY

App Localization, Data Management, Testing

We worked with TunnelBear to scale their business internationally by localizing their app for 16 languages, ensuring their brand identity remained intact across all cultures.

App Localization
eLearning Localization, Data Management, Testing

We collaborated on a delicate operation to localize eLearning and video content to help train online investigators in counterfeiting operations and cyber-crime.

eLearning Localization

REACH OUT TO US BY…

globalme technology localization company megaphone - technology localization services

MEGAPHONE

globalme technology localization company pigeon - technology localization services

CARRIER PIGEON

globalme technology localization company smoke - technology localization services

SMOKE SIGNAL

…OR SIMPLY FILL OUT THIS HANDY FORM

NEED MORE INFO? REACH OUT TO US: