MUSA: Multi-lingual Speaker Anonymization via Serial Disentanglement

Jixun Yao; Qing Wang; Pengcheng Guo; Ziqian Ning; Yuguang Yang; Yu; Pan; Lei Xie

arXiv:2407.11629·eess.AS·July 17, 2024

MUSA: Multi-lingual Speaker Anonymization via Serial Disentanglement

Jixun Yao, Qing Wang, Pengcheng Guo, Ziqian Ning, Yuguang Yang, Yu, Pan, Lei Xie

PDF

Open Access

TL;DR

MUSA is a multi-lingual speaker anonymization method that uses serial disentanglement and semantic distillation to effectively conceal speaker identity across languages while maintaining speech content and paralinguistic features.

Contribution

The paper introduces MUSA, a novel multi-lingual speaker anonymization approach employing serial disentanglement and a simple zero-vector anonymization strategy, enhancing generalization and reducing complexity.

Findings

01

Effective privacy protection across multiple languages.

02

Preserves linguistic and para-linguistic information.

03

Outperforms existing methods on official datasets.

Abstract

Speaker anonymization is an effective privacy protection solution designed to conceal the speaker's identity while preserving the linguistic content and para-linguistic information of the original speech. While most prior studies focus solely on a single language, an ideal speaker anonymization system should be capable of handling multiple languages. This paper proposes MUSA, a Multi-lingual Speaker Anonymization approach that employs a serial disentanglement strategy to perform a step-by-step disentanglement from a global time-invariant representation to a temporal time-variant representation. By utilizing semantic distillation and self-supervised speaker distillation, the serial disentanglement strategy can avoid strong inductive biases and exhibit superior generalization performance across different languages. Meanwhile, we propose a straightforward anonymization strategy that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Speech and Audio Processing

MethodsFocus