Speech Recognition Data Collection

Data Management, Localization, Testing

Our client takes the stress out of human-to-tech communication. Their innovations in voice, natural language understanding, reasoning and systems integration comes together to create more human oriented technology; tech that has adapted to the way people communicate instead of forcing people to adapt to machines.


The challenge was developing the next-generation in-car speech recognition technology, and our client needed hundreds of hours of speech data in various languages, demographics, and locations around the world. The data would be used to teach in-car systems to communicate with human beings. Hence, the need for a precise and comprehensive amalgamation of all possible terms, accents, phrases that would be used to communicate in the vehicle.


In order to do high-quality speech recognition data collection in the right environment and conditions, Globalme traveled to 10 countries and collected speech data from more than 2,000 participants over three months. The project initially began with data collections in China, Russia, Japan, Korea, Poland, Italy, Turkey and Spain. We presented the testers with various loosely-structured scenarios, in which we asked them to use the phrasing and language most natural to them. It is important to collect natural utterance data as not all users use the same terminology, sentence structure, language, dialect, or has the same accent.

There were countless challenges to overcome and account for; navigating in a foreign country, cultural differences of people feeling comfortable with the idea of speaking to a device or of sitting in a car with a stranger, air-travel logistics, figuring out different electric systems, and so on.

When our project team eventually returned to Vancouver with suitcases full of valuable data we did another 15 languages from our hometown. Thanks to this very multicultural city we were able to find almost every foreign language we needed right within the city limits. In Vancouver we collected Russian, Dutch and Korean speech language from more than 40 participants each. With the data Globalme collected, our client was able to build their research base, and continue the innovation in human and machine interaction.




App Localization, Data Management, Testing

We worked with TunnelBear to scale their business internationally by localizing their app for 16 languages, ensuring their brand identity remained intact across all cultures.

Learn More
eLearning Localization, Data Management, Testing

We collaborated on a delicate operation to localize eLearning and video content to help train online investigators in counterfeiting operations and cyber-crime.

Learn More

Let’s have a chat!

We want to hear from you. Whether it’s about our services or for some simple old-fashioned advice, we’re ready to chat.

You can reach us by using whichever way is best for you – smoke signal, carrier pigeon, or by filling out this handy form!