BERT, can HE predict contrastive focus? Predicting and controlling prominence in neural TTS using a language model
Brooke Stephenson, Laurent Besacier, Laurent Girin, Thomas Hueber

TL;DR
This paper investigates using BERT to predict and control contrastive focus in neural TTS, focusing on pronouns, by leveraging language model representations and contextual information for improved prosody modeling.
Contribution
It introduces a method to predict contrastive focus on pronouns using BERT and evaluates controllability of prominence in TTS, addressing a challenging prosodic prediction task.
Findings
BERT can predict contrastive focus with reasonable accuracy.
Context from previous utterances improves prediction performance.
Controlling prominence in TTS is feasible using predicted acoustic features.
Abstract
Several recent studies have tested the use of transformer language model representations to infer prosodic features for text-to-speech synthesis (TTS). While these studies have explored prosody in general, in this work, we look specifically at the prediction of contrastive focus on personal pronouns. This is a particularly challenging task as it often requires semantic, discursive and/or pragmatic knowledge to predict correctly. We collect a corpus of utterances containing contrastive focus and we evaluate the accuracy of a BERT model, finetuned to predict quantized acoustic prominence features, on these samples. We also investigate how past utterances can provide relevant information for this prediction. Furthermore, we evaluate the controllability of pronoun prominence in a TTS model conditioned on acoustic prominence features.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Speech Recognition and Synthesis · Speech and dialogue systems
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · Attention Is All You Need · Linear Layer · Adam · Linear Warmup With Linear Decay · Weight Decay · Layer Normalization · WordPiece · Softmax
