Audio Adversarial Examples for Robust Hybrid CTC/Attention Speech Recognition
Ludwig K\"urzinger, Edgar Ricardo Chavez Rosas, Lujun Li, Tobias, Watzel, Gerhard Rigoll

TL;DR
This paper introduces algorithms to generate audio adversarial examples targeting hybrid CTC/attention speech recognition systems, demonstrating their effectiveness and use in adversarial training to improve model robustness.
Contribution
It proposes novel algorithms for creating audio adversarial examples for hybrid CTC/attention ASR models, combining both approaches into a joint gradient method.
Findings
Successfully generated AAEs for hybrid CTC/attention models
Demonstrated improved robustness through adversarial training
Validated on reference sentences and TEDlium v2 dataset
Abstract
Recent advances in Automatic Speech Recognition (ASR) demonstrated how end-to-end systems are able to achieve state-of-the-art performance. There is a trend towards deeper neural networks, however those ASR models are also more complex and prone against specially crafted noisy data. Those Audio Adversarial Examples (AAE) were previously demonstrated on ASR systems that use Connectionist Temporal Classification (CTC), as well as attention-based encoder-decoder architectures. Following the idea of the hybrid CTC/attention ASR system, this work proposes algorithms to generate AAEs to combine both approaches into a joint CTC-attention gradient method. Evaluation is performed using a hybrid CTC/attention end-to-end ASR model on two reference sentences as case study, as well as the TEDlium v2 speech recognition task. We then demonstrate the application of this algorithm for adversarial…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
