Digital Einstein Experience: Fast Text-to-Speech for Conversational AI
Joanna Rownicka, Kilian Sprenkamp, Antonio Tripiana, Volodymyr, Gromoglasov, Timo P Kunz

TL;DR
This paper presents a fast, real-time text-to-speech system for a Digital Einstein character, enabling engaging human-computer interactions with a custom, contextually fitting voice.
Contribution
It introduces a novel TTS pipeline combining Fastspeech 2 and Parallel WaveGAN optimized for real-time delivery of a custom Einstein voice.
Findings
Supports real-time speech synthesis for conversational AI
Achieves high-quality, contextually appropriate voice output
Enables interactive digital Einstein experience
Abstract
We describe our approach to create and deliver a custom voice for a conversational AI use-case. More specifically, we provide a voice for a Digital Einstein character, to enable human-computer interaction within the digital conversation experience. To create the voice which fits the context well, we first design a voice character and we produce the recordings which correspond to the desired speech attributes. We then model the voice. Our solution utilizes Fastspeech 2 for log-scaled mel-spectrogram prediction from phonemes and Parallel WaveGAN to generate the waveforms. The system supports a character input and gives a speech waveform at the output. We use a custom dictionary for selected words to ensure their proper pronunciation. Our proposed cloud architecture enables for fast voice delivery, making it possible to talk to the digital version of Albert Einstein in real-time.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Topic Modeling · Natural Language Processing Techniques
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Position-Wise Feed-Forward Layer · FastSpeech 2 · Adam · Phase Shuffle · Convolution · Refunds@Expedia|||How do I get a full refund from Expedia? · LAMB
