Certified company | Engineering the world

Get in touch

Google Tacotron 2 Speaks Like Real People

Dec 31, 2017 | Uncategorized | 0 comments

The still audible difference between computer voices and real people should soon be over.Researchers from Google and the University of California have used neural networks to develop a system that uses text to create a natural-sounding language with meaningful accents. The project is called Tacotron 2.

Tacotron 2

Google’s Tacotron 2 project is an AI system working with the neural network Wavenet that analyzes sentence structure and word position to calculate the correct stress on syllables.For this purpose, a pitch diagram is created for the text, which then automatically adjusts the intonation of the sentences during speech output. The Wavenet algorithms are already in use in the Google Assistant for speech output. With Tacotron 2, the spoken text just sounds more natural. Integration into existing end products should, therefore, be easily possible.

According to researchers responsible for the project, the system was trained with a 24-hour data set, which was spoken by a professional speaker in American English. By using so-called Mel spectrograms as an intermediate Tacotron 2 achieve a particularly natural-sounding voice output, as these allow a higher mapping of the pitches.

Also Read: Exciting Times For NASA, Thanks To Google AI

To evaluate the quality of the system, 100 randomly selected sequences were created as audio files, which were then scored by humans on a scale of 1 to 5. The resulting “mean opinion score” (MOS) was an extremely good value of 4.525 for the AI system. Real human shots are only insignificant at 4.58.

Also Read: The Light Version Of Google Operating System, Android Go, Released

If you want to convince yourself of Google’s new speech output, you can do so on a demo page. There, the researchers have uploaded a series of audio files for text snippets that were previously unknown to the system. The high quality of the speech output is really amazing and virtually indistinguishable from normal human pronunciation. Tacotron 2 even gets along with typos and can classify the individual words in the overall context in such a way that the emphasis fits.

Also Read: Google You Owe Us, Say iPhone Users
Even if the AI system is only basic research. With near-perfect results, it will not be too long before Google integrates technology with Google Assistant and other products. Other IT companies such as Google’s Chinese counterpart Baidu are already working on similar systems. Already in March of this year, the Baidu engineers had announced a breakthrough in their voice output system. 


let’s get connected

Have a Question?

If you have any questions or need to discuss about your project
Feel free to reach out to our friendly team.