SEF-MK: Speaker-Embedding-Free Voice Anonymization through Multi-k-means Quantization

Beilong Tang; Xiaoxiao Miao; Xin Wang; Ming Li

arXiv:2508.07086·cs.SD·August 19, 2025

SEF-MK: Speaker-Embedding-Free Voice Anonymization through Multi-k-means Quantization

Beilong Tang, Xiaoxiao Miao, Xin Wang, Ming Li

PDF

Open Access

TL;DR

SEF-MK introduces a novel voice anonymization method that employs multiple k-means models to better protect speaker identity while maintaining linguistic content, though it also influences attack effectiveness.

Contribution

This paper presents a new speaker-embedding-free voice anonymization framework using multi-k-means quantization, enhancing privacy and content preservation.

Findings

01

Multi-k-means models improve linguistic and emotional content preservation.

02

Using multiple models increases attack success rates.

03

The approach offers insights into balancing privacy and content retention.

Abstract

Voice anonymization protects speaker privacy by concealing identity while preserving linguistic and paralinguistic content. Self-supervised learning (SSL) representations encode linguistic features but preserve speaker traits. We propose a novel speaker-embedding-free framework called SEF-MK. Instead of using a single k-means model trained on the entire dataset, SEF-MK anonymizes SSL representations for each utterance by randomly selecting one of multiple k-means models, each trained on a different subset of speakers. We explore this approach from both attacker and user perspectives. Extensive experiments show that, compared to a single k-means model, SEF-MK with multiple k-means models better preserves linguistic and emotional content from the user's viewpoint. However, from the attacker's perspective, utilizing multiple k-means models boosts the effectiveness of privacy attacks. These…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing