TL;DR
This study demonstrates that MP3 compression can effectively reduce adversarial noise in audio inputs, thereby improving speech recognition accuracy against adversarial attacks while minimally affecting normal speech recognition.
Contribution
The paper introduces MP3 compression as a novel method to diminish adversarial noise in audio samples for end-to-end speech recognition systems, validated through objective metrics.
Findings
MP3 compression reduces character error rates on adversarial examples.
MP3 compression increases signal-to-noise ratio in reconstructed audio.
MP3 compression does not significantly affect normal speech recognition accuracy.
Abstract
Audio Adversarial Examples (AAE) represent specially created inputs meant to trick Automatic Speech Recognition (ASR) systems into misclassification. The present work proposes MP3 compression as a means to decrease the impact of Adversarial Noise (AN) in audio samples transcribed by ASR systems. To this end, we generated AAEs with the Fast Gradient Sign Method for an end-to-end, hybrid CTC-attention ASR system. Our method is then validated by two objective indicators: (1) Character Error Rates (CER) that measure the speech decoding performance of four ASR models trained on uncompressed, as well as MP3-compressed data sets and (2) Signal-to-Noise Ratio (SNR) estimated for both uncompressed and MP3-compressed AAEs that are reconstructed in the time domain by feature inversion. We found that MP3 compression applied to AAEs indeed reduces the CER when compared to uncompressed AAEs.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
