VoxATtack: A Multimodal Attack on Voice Anonymization Systems
Ahmad Aloradi, \"Unal Ege Gaznepoglu, Emanu\"el A. P. Habets, Daniel Tenbrinck

TL;DR
VoxATtack is a multimodal de-anonymization model that combines acoustic and textual data to effectively attack voice anonymization systems, exposing vulnerabilities in current privacy protections.
Contribution
This work introduces VoxATtack, a novel dual-branch model that fuses acoustic and textual features to improve de-anonymization performance against voice anonymization systems.
Findings
Outperforms top attackers on five VPAC benchmarks
Achieves state-of-the-art results after data augmentation techniques
Reveals vulnerabilities in current voice anonymization methods
Abstract
Voice anonymization systems aim to protect speaker privacy by obscuring vocal traits while preserving the linguistic content relevant for downstream applications. However, because these linguistic cues remain intact, they can be exploited to identify semantic speech patterns associated with specific speakers. In this work, we present VoxATtack, a novel multimodal de-anonymization model that incorporates both acoustic and textual information to attack anonymization systems. While previous research has focused on refining speaker representations extracted from speech, we show that incorporating textual information with a standard ECAPA-TDNN improves the attacker's performance. Our proposed VoxATtack model employs a dual-branch architecture, with an ECAPA-TDNN processing anonymized speech and a pretrained BERT encoding the transcriptions. Both outputs are projected into embeddings of equal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Hate Speech and Cyberbullying Detection · Authorship Attribution and Profiling
