NTU-NPU System for Voice Privacy 2024 Challenge
Nikita Kuzmin, Hieu-Thi Luong, Jixun Yao, Lei Xie, Kong Aik Lee, Eng, Siong Chng

TL;DR
This paper details enhancements to baseline speech anonymization systems for the Voice Privacy Challenge 2024, focusing on improving privacy and utility through various embedding, anonymization, and disentanglement techniques.
Contribution
We enhance existing baselines with emotion embedding, advanced speaker and prosody anonymization, and introduce Mean Reversion F0 and disentanglement models to improve privacy and utility.
Findings
Improved privacy metrics with Mean Reversion F0.
Enhanced speaker anonymization using WavLM and ECAPA2.
Disentanglement models show potential for privacy-utility trade-offs.
Abstract
In this work, we describe our submissions for the Voice Privacy Challenge 2024. Rather than proposing a novel speech anonymization system, we enhance the provided baselines to meet all required conditions and improve evaluated metrics. Specifically, we implement emotion embedding and experiment with WavLM and ECAPA2 speaker embedders for the B3 baseline. Additionally, we compare different speaker and prosody anonymization techniques. Furthermore, we introduce Mean Reversion F0 for B5, which helps to enhance privacy without a loss in utility. Finally, we explore disentanglement models, namely -VAE and NaturalSpeech3 FACodec.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
