Improving Voice Quality in Speech Anonymization With Just Perception-Informed Losses
Suhita Ghosh, Tim Thiele, Frederic Lorbeer, Frank Dreyer, Sebastian, Stober

TL;DR
This paper introduces perception-informed loss functions for speech anonymization that improve voice naturalness and intelligibility while preserving speaker privacy, using a model-agnostic approach with VQVAE.
Contribution
It proposes novel perception-inspired loss functions that enhance voice quality in speech anonymization, outperforming traditional methods across multiple metrics and datasets.
Findings
Enhanced naturalness and intelligibility with perception-driven losses
Consistent improvements across languages and speaker genders
Outperforms vanilla models in subjective and objective evaluations
Abstract
The increasing use of cloud-based speech assistants has heightened the need for effective speech anonymization, which aims to obscure a speaker's identity while retaining critical information for subsequent tasks. One approach to achieving this is through voice conversion. While existing methods often emphasize complex architectures and training techniques, our research underscores the importance of loss functions inspired by the human auditory system. Our proposed loss functions are model-agnostic, incorporating handcrafted and deep learning-based features to effectively capture quality representations. Through objective and subjective evaluations, we demonstrate that a VQVAE-based model, enhanced with our perception-driven losses, surpasses the vanilla model in terms of naturalness, intelligibility, and prosody while maintaining speaker anonymity. These improvements are consistently…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing
