Stream-Voice-Anon: Enhancing Utility of Real-Time Speaker Anonymization via Neural Audio Codec and Language Models

Nikita Kuzmin; Songting Liu; Kong Aik Lee; Eng Siong Chng

arXiv:2601.13948·eess.AS·February 6, 2026

Stream-Voice-Anon: Enhancing Utility of Real-Time Speaker Anonymization via Neural Audio Codec and Language Models

Nikita Kuzmin, Songting Liu, Kong Aik Lee, Eng Siong Chng

PDF

Open Access

TL;DR

Stream-Voice-Anon introduces a neural audio codec and language model-based system for real-time speaker anonymization, improving intelligibility and emotion preservation while balancing latency and privacy in streaming voice applications.

Contribution

It adapts neural audio codec and language model architectures for streaming speaker anonymization with novel techniques for privacy and control, advancing beyond voice conversion methods.

Findings

01

Achieves up to 46% WER reduction, improving intelligibility.

02

Preserves up to 28% more emotion information.

03

Maintains comparable latency with previous methods.

Abstract

Protecting speaker identity is crucial for online voice applications, yet streaming speaker anonymization (SA) remains underexplored. Recent research has demonstrated that neural audio codec (NAC) provides superior speaker feature disentanglement and linguistic fidelity. NAC can also be used with causal language models (LM) to enhance linguistic fidelity and prompt control for streaming tasks. However, existing NAC-based online LM systems are designed for voice conversion (VC) rather than anonymization, lacking the techniques required for privacy protection. Building on these advances, we present Stream-Voice-Anon, which adapts modern causal LM-based NAC architectures specifically for streaming SA by integrating anonymization techniques. Our anonymization approach incorporates pseudo-speaker representation sampling, a speaker embedding mixing and diverse prompt selection strategies for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Voice and Speech Disorders · Adversarial Robustness in Machine Learning