VoxATtack: A Multimodal Attack on Voice Anonymization Systems

Ahmad Aloradi; \"Unal Ege Gaznepoglu; Emanu\"el A. P. Habets; Daniel Tenbrinck

arXiv:2507.12081·eess.AS·May 21, 2026

VoxATtack: A Multimodal Attack on Voice Anonymization Systems

Ahmad Aloradi, \"Unal Ege Gaznepoglu, Emanu\"el A. P. Habets, Daniel Tenbrinck

PDF

TL;DR

VoxATtack is a multimodal de-anonymization model that combines acoustic and textual data to effectively attack voice anonymization systems, exposing vulnerabilities in current privacy protections.

Contribution

This work introduces VoxATtack, a novel dual-branch model that fuses acoustic and textual features to improve de-anonymization performance against voice anonymization systems.

Findings

01

Outperforms top attackers on five VPAC benchmarks

02

Achieves state-of-the-art results after data augmentation techniques

03

Reveals vulnerabilities in current voice anonymization methods

Abstract

Voice anonymization systems aim to protect speaker privacy by obscuring vocal traits while preserving the linguistic content relevant for downstream applications. However, because these linguistic cues remain intact, they can be exploited to identify semantic speech patterns associated with specific speakers. In this work, we present VoxATtack, a novel multimodal de-anonymization model that incorporates both acoustic and textual information to attack anonymization systems. While previous research has focused on refining speaker representations extracted from speech, we show that incorporating textual information with a standard ECAPA-TDNN improves the attacker's performance. Our proposed VoxATtack model employs a dual-branch architecture, with an ECAPA-TDNN processing anonymized speech and a pretrained BERT encoding the transcriptions. Both outputs are projected into embeddings of equal…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Hate Speech and Cyberbullying Detection · Authorship Attribution and Profiling