Voice Conversion for Lombard Speaking Style with Implicit and Explicit Acoustic Feature Conditioning

Dominika Woszczyk; Manuel Sam Ribeiro; Thomas Merritt; Daniel Korzekwa

arXiv:2507.09310·cs.SD·July 15, 2025

Voice Conversion for Lombard Speaking Style with Implicit and Explicit Acoustic Feature Conditioning

Dominika Woszczyk, Manuel Sam Ribeiro, Thomas Merritt, Daniel Korzekwa

PDF

Open Access

TL;DR

This paper explores voice conversion techniques to transfer Lombard speaking style, aiming to improve speech intelligibility and style preservation without extensive target speaker data, using implicit and explicit acoustic feature conditioning.

Contribution

It introduces a novel implicit acoustic feature conditioning method for Lombard style transfer that matches explicit conditioning in intelligibility and maintains speaker similarity.

Findings

01

Implicit conditioning achieves comparable intelligibility gains to explicit conditioning.

02

The proposed method preserves speaker identity effectively.

03

Voice conversion can augment TTS training data in Lombard style.

Abstract

Text-to-Speech (TTS) systems in Lombard speaking style can improve the overall intelligibility of speech, useful for hearing loss and noisy conditions. However, training those models requires a large amount of data and the Lombard effect is challenging to record due to speaker and noise variability and tiring recording conditions. Voice conversion (VC) has been shown to be a useful augmentation technique to train TTS systems in the absence of recorded data from the target speaker in the target speaking style. In this paper, we are concerned with Lombard speaking style transfer. Our goal is to convert speaker identity while preserving the acoustic attributes that define the Lombard speaking style. We compare voice conversion models with implicit and explicit acoustic feature conditioning. We observe that our proposed implicit conditioning strategy achieves an intelligibility gain…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Advanced Data Compression Techniques