DUAP: Dual-task Universal Adversarial Perturbations Against Voice Control Systems

Suyang Sun; Weifei Jin; Yuxin Cao; Wei Song; Jie Hao

arXiv:2601.12786·cs.CR·April 1, 2026

DUAP: Dual-task Universal Adversarial Perturbations Against Voice Control Systems

Suyang Sun, Weifei Jin, Yuxin Cao, Wei Song, Jie Hao

PDF

TL;DR

This paper introduces DUAP, a novel dual-task adversarial attack method that effectively disrupts both speech recognition and speaker identification in voice control systems, with high success and imperceptibility.

Contribution

The paper proposes DUAP, a dual-task attack leveraging a surrogate objective and ensemble strategy to attack combined voice recognition tasks, improving transferability and imperceptibility.

Findings

01

DUAP achieves high success rates against multiple ASR and SR models.

02

DUAP outperforms existing single-task adversarial attacks in effectiveness.

03

Perturbations generated by DUAP are highly imperceptible due to psychoacoustic masking.

Abstract

Modern Voice Control Systems (VCS) rely on the collaboration of Automatic Speech Recognition (ASR) and Speaker Recognition (SR) for secure interaction. However, prior adversarial attacks typically target these tasks in isolation, overlooking the coupled decision pipeline in real-world scenarios. Consequently, single-task attacks often fail to pose a practical threat. To fill this gap, we first utilize gradient analysis to reveal that ASR and SR exhibit no inherent conflicts. Building on this, we propose Dual-task Universal Adversarial Perturbation (DUAP). Specifically, DUAP employs a targeted surrogate objective to effectively disrupt ASR transcription and introduces a Dynamic Normalized Ensemble (DNE) strategy to enhance transferability across diverse SR models. Furthermore, we incorporate psychoacoustic masking to ensure perturbation imperceptibility. Extensive evaluations across five…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.