SEF-MK: Speaker-Embedding-Free Voice Anonymization through Multi-k-means Quantization
Beilong Tang, Xiaoxiao Miao, Xin Wang, Ming Li

TL;DR
SEF-MK introduces a novel voice anonymization method that employs multiple k-means models to better protect speaker identity while maintaining linguistic content, though it also influences attack effectiveness.
Contribution
This paper presents a new speaker-embedding-free voice anonymization framework using multi-k-means quantization, enhancing privacy and content preservation.
Findings
Multi-k-means models improve linguistic and emotional content preservation.
Using multiple models increases attack success rates.
The approach offers insights into balancing privacy and content retention.
Abstract
Voice anonymization protects speaker privacy by concealing identity while preserving linguistic and paralinguistic content. Self-supervised learning (SSL) representations encode linguistic features but preserve speaker traits. We propose a novel speaker-embedding-free framework called SEF-MK. Instead of using a single k-means model trained on the entire dataset, SEF-MK anonymizes SSL representations for each utterance by randomly selecting one of multiple k-means models, each trained on a different subset of speakers. We explore this approach from both attacker and user perspectives. Extensive experiments show that, compared to a single k-means model, SEF-MK with multiple k-means models better preserves linguistic and emotional content from the user's viewpoint. However, from the attacker's perspective, utilizing multiple k-means models boosts the effectiveness of privacy attacks. These…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing
