MUSA: Multi-lingual Speaker Anonymization via Serial Disentanglement
Jixun Yao, Qing Wang, Pengcheng Guo, Ziqian Ning, Yuguang Yang, Yu, Pan, Lei Xie

TL;DR
MUSA is a multi-lingual speaker anonymization method that uses serial disentanglement and semantic distillation to effectively conceal speaker identity across languages while maintaining speech content and paralinguistic features.
Contribution
The paper introduces MUSA, a novel multi-lingual speaker anonymization approach employing serial disentanglement and a simple zero-vector anonymization strategy, enhancing generalization and reducing complexity.
Findings
Effective privacy protection across multiple languages.
Preserves linguistic and para-linguistic information.
Outperforms existing methods on official datasets.
Abstract
Speaker anonymization is an effective privacy protection solution designed to conceal the speaker's identity while preserving the linguistic content and para-linguistic information of the original speech. While most prior studies focus solely on a single language, an ideal speaker anonymization system should be capable of handling multiple languages. This paper proposes MUSA, a Multi-lingual Speaker Anonymization approach that employs a serial disentanglement strategy to perform a step-by-step disentanglement from a global time-invariant representation to a temporal time-variant representation. By utilizing semantic distillation and self-supervised speaker distillation, the serial disentanglement strategy can avoid strong inductive biases and exhibit superior generalization performance across different languages. Meanwhile, we propose a straightforward anonymization strategy that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Speech and Audio Processing
MethodsFocus
