Audio Adversarial Examples for Robust Hybrid CTC/Attention Speech   Recognition

Ludwig K\"urzinger; Edgar Ricardo Chavez Rosas; Lujun Li; Tobias; Watzel; Gerhard Rigoll

arXiv:2007.10723·eess.AS·July 22, 2020

Audio Adversarial Examples for Robust Hybrid CTC/Attention Speech Recognition

Ludwig K\"urzinger, Edgar Ricardo Chavez Rosas, Lujun Li, Tobias, Watzel, Gerhard Rigoll

PDF

TL;DR

This paper introduces algorithms to generate audio adversarial examples targeting hybrid CTC/attention speech recognition systems, demonstrating their effectiveness and use in adversarial training to improve model robustness.

Contribution

It proposes novel algorithms for creating audio adversarial examples for hybrid CTC/attention ASR models, combining both approaches into a joint gradient method.

Findings

01

Successfully generated AAEs for hybrid CTC/attention models

02

Demonstrated improved robustness through adversarial training

03

Validated on reference sentences and TEDlium v2 dataset

Abstract

Recent advances in Automatic Speech Recognition (ASR) demonstrated how end-to-end systems are able to achieve state-of-the-art performance. There is a trend towards deeper neural networks, however those ASR models are also more complex and prone against specially crafted noisy data. Those Audio Adversarial Examples (AAE) were previously demonstrated on ASR systems that use Connectionist Temporal Classification (CTC), as well as attention-based encoder-decoder architectures. Following the idea of the hybrid CTC/attention ASR system, this work proposes algorithms to generate AAEs to combine both approaches into a joint CTC-attention gradient method. Evaluation is performed using a hybrid CTC/attention end-to-end ASR model on two reference sentences as case study, as well as the TEDlium v2 speech recognition task. We then demonstrate the application of this algorithm for adversarial…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.