On the robustness of non-intrusive speech quality model by adversarial examples
Hsin-Yi Lin, Huan-Hsin Tseng, Yu Tsao

TL;DR
This paper demonstrates that deep learning speech quality models are vulnerable to subtle adversarial attacks and shows that adversarial training can improve their robustness against such perturbations.
Contribution
It reveals the vulnerability of deep speech quality models to adversarial examples and evaluates adversarial training as a method to enhance their stability.
Findings
Deep models can be fooled by minimal perturbations as low as -30 dB.
Adversarial training improves the robustness of speech quality prediction models.
Vulnerability exposes potential reliability issues in deploying these models in real-world scenarios.
Abstract
It has been shown recently that deep learning based models are effective on speech quality prediction and could outperform traditional metrics in various perspectives. Although network models have potential to be a surrogate for complex human hearing perception, they may contain instabilities in predictions. This work shows that deep speech quality predictors can be vulnerable to adversarial perturbations, where the prediction can be changed drastically by unnoticeable perturbations as small as dB compared with speech inputs. In addition to exposing the vulnerability of deep speech quality predictors, we further explore and confirm the viability of adversarial training for strengthening robustness of models.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Ultrasonics and Acoustic Wave Propagation · Structural Health Monitoring Techniques
