Digital Einstein Experience: Fast Text-to-Speech for Conversational AI

Joanna Rownicka; Kilian Sprenkamp; Antonio Tripiana; Volodymyr; Gromoglasov; Timo P Kunz

arXiv:2107.10658·eess.AS·July 23, 2021

Digital Einstein Experience: Fast Text-to-Speech for Conversational AI

Joanna Rownicka, Kilian Sprenkamp, Antonio Tripiana, Volodymyr, Gromoglasov, Timo P Kunz

PDF

Open Access

TL;DR

This paper presents a fast, real-time text-to-speech system for a Digital Einstein character, enabling engaging human-computer interactions with a custom, contextually fitting voice.

Contribution

It introduces a novel TTS pipeline combining Fastspeech 2 and Parallel WaveGAN optimized for real-time delivery of a custom Einstein voice.

Findings

01

Supports real-time speech synthesis for conversational AI

02

Achieves high-quality, contextually appropriate voice output

03

Enables interactive digital Einstein experience

Abstract

We describe our approach to create and deliver a custom voice for a conversational AI use-case. More specifically, we provide a voice for a Digital Einstein character, to enable human-computer interaction within the digital conversation experience. To create the voice which fits the context well, we first design a voice character and we produce the recordings which correspond to the desired speech attributes. We then model the voice. Our solution utilizes Fastspeech 2 for log-scaled mel-spectrogram prediction from phonemes and Parallel WaveGAN to generate the waveforms. The system supports a character input and gives a speech waveform at the output. We use a custom dictionary for selected words to ensure their proper pronunciation. Our proposed cloud architecture enables for fast voice delivery, making it possible to talk to the digital version of Albert Einstein in real-time.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Topic Modeling · Natural Language Processing Techniques

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Position-Wise Feed-Forward Layer · FastSpeech 2 · Adam · Phase Shuffle · Convolution · Refunds@Expedia|||How do I get a full refund from Expedia? · LAMB