NTU-NPU System for Voice Privacy 2024 Challenge

Nikita Kuzmin; Hieu-Thi Luong; Jixun Yao; Lei Xie; Kong Aik Lee; Eng; Siong Chng

arXiv:2410.02371·eess.AS·October 11, 2024

NTU-NPU System for Voice Privacy 2024 Challenge

Nikita Kuzmin, Hieu-Thi Luong, Jixun Yao, Lei Xie, Kong Aik Lee, Eng, Siong Chng

PDF

TL;DR

This paper details enhancements to baseline speech anonymization systems for the Voice Privacy Challenge 2024, focusing on improving privacy and utility through various embedding, anonymization, and disentanglement techniques.

Contribution

We enhance existing baselines with emotion embedding, advanced speaker and prosody anonymization, and introduce Mean Reversion F0 and disentanglement models to improve privacy and utility.

Findings

01

Improved privacy metrics with Mean Reversion F0.

02

Enhanced speaker anonymization using WavLM and ECAPA2.

03

Disentanglement models show potential for privacy-utility trade-offs.

Abstract

In this work, we describe our submissions for the Voice Privacy Challenge 2024. Rather than proposing a novel speech anonymization system, we enhance the provided baselines to meet all required conditions and improve evaluated metrics. Specifically, we implement emotion embedding and experiment with WavLM and ECAPA2 speaker embedders for the B3 baseline. Additionally, we compare different speaker and prosody anonymization techniques. Furthermore, we introduce Mean Reversion F0 for B5, which helps to enhance privacy without a loss in utility. Finally, we explore disentanglement models, namely $β$ -VAE and NaturalSpeech3 FACodec.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.