Privacy-Preserving End-to-End Full-Duplex Speech Dialogue Models

Nikita Kuzmin; Tao Zhong; Jiajun Deng; Yingke Zhu; Tristan Tsoi; Tianxiang Cao; Simon Lui; Kong Aik Lee; Eng Siong Chng

arXiv:2603.08179·eess.AS·March 10, 2026

Privacy-Preserving End-to-End Full-Duplex Speech Dialogue Models

Nikita Kuzmin, Tao Zhong, Jiajun Deng, Yingke Zhu, Tristan Tsoi, Tianxiang Cao, Simon Lui, Kong Aik Lee, Eng Siong Chng

PDF

Open Access

TL;DR

This paper investigates speaker privacy risks in full-duplex speech models and proposes anonymization methods that significantly reduce speaker leakage while maintaining speech recognition performance.

Contribution

It reveals the extent of speaker identity leakage in full-duplex speech models and introduces two effective anonymization techniques to mitigate this privacy risk.

Findings

01

Hidden states leak speaker identity across all layers.

02

Anon-W2F increases speaker verification error rate by over 3.5 times.

03

Anon-W2W preserves most speech recognition accuracy with low latency.

Abstract

End-to-end full-duplex speech models feed user audio through an always-on LLM backbone, yet the speaker privacy implications of their hidden representations remain unexamined. Following the VoicePrivacy 2024 protocol with a lazy-informed attacker, we show that the hidden states of SALM-Duplex and Moshi leak substantial speaker identity across all transformer layers. Layer-wise and turn-wise analyses reveal that leakage persists across all layers, with SALM-Duplex showing stronger leakage in early layers while Moshi leaks uniformly, and that Linkability rises sharply within the first few turns. We propose two streaming anonymization setups using Stream-Voice-Anon: a waveform-level front-end (Anon-W2W) and a feature-domain replacement (Anon-W2F). Anon-W2F raises EER by over 3.5x relative to the discrete encoder baseline (11.2% to 41.0%), approaching the 50% random-chance ceiling, while…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Voice and Speech Disorders