Lombard Speech Synthesis for Any Voice with Controllable Style Embeddings
Seymanur Akti, Alexander Waibel

TL;DR
This paper introduces a controllable TTS system that synthesizes Lombard speech for any speaker by manipulating style embeddings, improving intelligibility in noisy environments without needing Lombard data during training.
Contribution
It presents a novel method to generate Lombard speech for any speaker using style embeddings and PCA analysis, without requiring explicit Lombard training data.
Findings
Enhanced speech intelligibility under noise
Preserved speaker identity and naturalness
Fine-grained control over Lombard speech style
Abstract
The Lombard effect plays a key role in natural communication, particularly in noisy environments or when addressing hearing-impaired listeners. We present a controllable text-to-speech (TTS) system capable of synthesizing Lombard speech for any speaker without requiring explicit Lombard data during training. Our approach leverages style embeddings learned from a large, prosodically diverse dataset and analyzes their correlation with Lombard attributes using principal component analysis (PCA). By shifting the relevant PCA components, we manipulate the style embeddings and incorporate them into our TTS model to generate speech at desired Lombard levels. Evaluations demonstrate that our method preserves naturalness and speaker identity, enhances intelligibility under noise, and provides fine-grained control over prosody, offering a robust solution for controllable Lombard TTS for any…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Phonetics and Phonology Research · Voice and Speech Disorders
