DAST: A Dual-Stream Voice Anonymization Attacker with Staged Training

Ridwan Arefeen; Xiaoxiao Miao; Rong Tong; Aik Beng Ng; Simon See; Timothy Liu

arXiv:2603.12840·cs.SD·March 17, 2026

DAST: A Dual-Stream Voice Anonymization Attacker with Staged Training

Ridwan Arefeen, Xiaoxiao Miao, Rong Tong, Aik Beng Ng, Simon See, Timothy Liu

PDF

Open Access

TL;DR

This paper introduces DAST, a dual-stream attacker with staged training that effectively breaks voice anonymization by leveraging spectral and self-supervised features, achieving state-of-the-art attack performance.

Contribution

The paper proposes a novel dual-stream attacker with a three-stage training strategy to improve privacy evaluation of voice anonymization systems.

Findings

01

Stage II training enhances cross-system robustness.

02

Fine-tuning with 10% data surpasses current state-of-the-art.

03

The attacker effectively exposes vulnerabilities in anonymization methods.

Abstract

Voice anonymization masks vocal traits while preserving linguistic content, which may still leak speaker-specific patterns. To assess and strengthen privacy evaluation, we propose a dual-stream attacker that fuses spectral and self-supervised learning features via parallel encoders with a three-stage training strategy. Stage I establishes foundational speaker-discriminative representations. Stage II leverages the shared identity-transformation characteristics of voice conversion and anonymization, exposing the model to diverse converted speech to build cross-system robustness. Stage III provides lightweight adaptation to target anonymized data. Results on the VoicePrivacy Attacker Challenge (VPAC) dataset demonstrate that Stage II is the primary driver of generalization, enabling strong attacking performance on unseen anonymization datasets. With Stage III, fine-tuning on only 10\% of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Authorship Attribution and Profiling · Voice and Speech Disorders