MDD: a Mask Diffusion Detector to Protect Speaker Verification Systems from Adversarial Perturbations

Yibo Bai; Sizhou Chen; Michele Panariello; Xiao-Lei Zhang; Massimiliano Todisco; Nicholas Evans

arXiv:2508.19180·eess.AS·August 27, 2025

MDD: a Mask Diffusion Detector to Protect Speaker Verification Systems from Adversarial Perturbations

Yibo Bai, Sizhou Chen, Michele Panariello, Xiao-Lei Zhang, Massimiliano Todisco, Nicholas Evans

PDF

TL;DR

This paper introduces MDD, a diffusion-based framework that detects and purifies adversarial perturbations in speaker verification systems without needing adversarial examples or extensive pretraining.

Contribution

MDD is a novel diffusion model-based detector that improves adversarial detection and purification in speaker verification, outperforming existing methods without relying on adversarial data.

Findings

01

MDD achieves superior adversarial detection accuracy.

02

MDD effectively restores speaker verification performance.

03

The method does not require adversarial examples or large-scale pretraining.

Abstract

Speaker verification systems are increasingly deployed in security-sensitive applications but remain highly vulnerable to adversarial perturbations. In this work, we propose the Mask Diffusion Detector (MDD), a novel adversarial detection and purification framework based on a \textit{text-conditioned masked diffusion model}. During training, MDD applies partial masking to Mel-spectrograms and progressively adds noise through a forward diffusion process, simulating the degradation of clean speech features. A reverse process then reconstructs the clean representation conditioned on the input transcription. Unlike prior approaches, MDD does not require adversarial examples or large-scale pretraining. Experimental results show that MDD achieves strong adversarial detection performance and outperforms prior state-of-the-art methods, including both diffusion-based and neural codec-based…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.