Speech Processing

The research goal for speech at Google aligns with our company mission: to organize the world’s information and make it universally accessible and useful. Our pioneering research work in speech processing has enabled us to build automatic speech recognition (ASR) and text-to-speech (TTS) systems that are used across Google products, with support for more than a hundred language varieties spoken across the globe. From Gboard dictation to transcriptions of voice notes, from YouTube captions to team meetings without language barriers, and from Google Maps speaking directions aloud to Google Assistant reading the news, Google’s speech research has unparalleled reach and impact. We aim to solve speech for everyone, everywhere – and work to further improve quality, speed and versatility across all kinds of speech. We're also committed to expanding our language coverage, and have set a moonshot goal to build speech technologies for 1,000 languages.

Google's speech research efforts push the state-of-the-art on architectures and algorithms used across areas like speech recognition, text-to-speech synthesis, keyword spotting, speaker recognition, and language identification. The systems we build are deployed on servers in Google’s data centers but also increasingly on-device. The team has a passion for research that leads to product advances for the billions of users that use speech in Google products today. We also release academic publications and open-source projects for the broader research community to leverage.

Our speech technologies are embedded in products like the Assistant, Search, Gboard, Translate, Maps, YouTube, Cloud, and many more. Thanks to close collaborations with product teams, we are in a unique position to deliver user-centric research. Our researchers can conduct live experiments to test and benchmark new algorithms directly in a realistic controlled environment. Whether these are algorithmic improvements or user experience and human-computer interaction studies, we focus on solving real problems with real impact on users.

We value our user diversity, and have made it a priority to deliver the best performance to every language and language variety. Today, our speech systems operate in more than 130 language varieties, and we continue to expand our reach. The challenges of internationalizing at scale are immense and rewarding. We are breaking new ground by deploying speech technologies that help people communicate, access information online, and share their knowledge – all in their language. And combined with the unprecedented translation capabilities of Google Translate, we are also at the forefront of research in speech-to-speech translation and one step closer to a universal translator.

Recent Publications

Project Euphonia: Advancing Inclusive Speech Recognition through Expanded Data Collection and Evaluation

Alicia Martín

Bob MacDonald

Julie Cattiau

Pan-Pan Jiang

Jimmy Tobin

Philip Q Nelson

Katrin Tomanek

Frontiers in Language Sciences (2025)

Binamix -- A Python Library for Generating Binaural Audio Datasets

Dan Barry

Davoud Shariat Panah

Alessandro Ragano

Jan Skoglund

Andrew Hines

AES 158th Audio Engineering Society Convention (2025)

Perceptual Evaluation of a Mix Presentation for Immersive Audio with IAMF

Carlos Tejeda-Ocampo

Toni Hirvonen

Ema Souza-Blanes

Mahmoud Namazi

Jan Skoglund

AES 158th Convention of the Audio Engineering Society (2025)

On the Design of the Binaural Rendering Library for Eclipsa Audio Immersive Audio Container

Tomasz Rudzki

Gavin Kearney

Jan Skoglund

AES 158th Convention of the Audio Engineering Society (2025)

A Novel CI Coding Strategy Based on a Cochlear Model and Deep Neural Network

Maryam Hosseini

Tim Brochier

Zachary Smith

Brett Swanson

Andrew Vandali

Alan Kan

Fadwa Alnafjan

Kat Fernandez

Richard F. Lyon

Conference on Implantable Auditory Prostheses 2025

A Study of Raters' Sensitivity to Inter-sentence Pause Durations in American English Speech

Paul Owoicho

Josh Camp

Tom Kenter

Speech Prosody 2024 (SP2024) (2024) (to appear)

Defining the technology of today and tomorrow.

Philosophy

People

Research areas

Foundational ML & Algorithms

Computing Systems & Quantum AI

Science, AI & Society

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Speech Processing

Recent Publications

Some of our teams

Join us