Towards Evaluating the Robustness of Automatic Speech Recognition   Systems via Audio Style Transfer

Weifei Jin; Yuxin Cao; Junjie Su; Qi Shen; Kai Ye; Derui Wang; Jie; Hao; Ziyao Liu

arXiv:2405.09470·cs.SD·May 16, 2024

Towards Evaluating the Robustness of Automatic Speech Recognition Systems via Audio Style Transfer

Weifei Jin, Yuxin Cao, Junjie Su, Qi Shen, Kai Ye, Derui Wang, Jie, Hao, Ziyao Liu

PDF

Open Access

TL;DR

This paper introduces a novel style transfer-based attack on ASR systems that allows user customization, achieving high success rates while maintaining audio naturalness, addressing limitations of previous adversarial methods.

Contribution

The paper proposes a new attack method combining style transfer and adversarial techniques, improving controllability and naturalness in attacking ASR systems.

Findings

01

Achieves 82% attack success rate.

02

Maintains sound naturalness according to user studies.

03

Enables user-customized audio styles in adversarial attacks.

Abstract

In light of the widespread application of Automatic Speech Recognition (ASR) systems, their security concerns have received much more attention than ever before, primarily due to the susceptibility of Deep Neural Networks. Previous studies have illustrated that surreptitiously crafting adversarial perturbations enables the manipulation of speech recognition systems, resulting in the production of malicious commands. These attack methods mostly require adding noise perturbations under $ℓ_{p}$ norm constraints, inevitably leaving behind artifacts of manual modifications. Recent research has alleviated this limitation by manipulating style vectors to synthesize adversarial examples based on Text-to-Speech (TTS) synthesis audio. However, style modifications based on optimization objectives significantly reduce the controllability and editability of audio styles. In this paper, we propose…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing