What the Future Brings: Investigating the Impact of Lookahead for Incremental Neural TTS
Brooke Stephenson, Laurent Besacier, Laurent Girin, Thomas Hueber

TL;DR
This paper investigates how incremental neural TTS systems evolve their representations with lookahead and how this affects speech quality, revealing that limited lookahead improves representation but may reduce synthesis quality.
Contribution
It provides a detailed analysis of incremental neural TTS behavior with varying lookahead and identifies key text features influencing representation evolution.
Findings
Tokens reach 88% of full context with 1-word lookahead
Tokens reach 94% of full context with 2-word lookahead
Speech quality with 2-word lookahead is significantly lower than full sentence
Abstract
In incremental text to speech synthesis (iTTS), the synthesizer produces an audio output before it has access to the entire input sentence. In this paper, we study the behavior of a neural sequence-to-sequence TTS system when used in an incremental mode, i.e. when generating speech output for token n, the system has access to n + k tokens from the text sequence. We first analyze the impact of this incremental policy on the evolution of the encoder representations of token n for different values of k (the lookahead parameter). The results show that, on average, tokens travel 88% of the way to their full context representation with a one-word lookahead and 94% after 2 words. We then investigate which text features are the most influential on the evolution towards the final representation using a random forest analysis. The results show that the most salient factors are related to token…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Topic Modeling · Music and Audio Processing
MethodsEmirates Airlines Office in Dubai
