Revisiting Acoustic Features for Robust ASR
Muhammad A. Shah, Bhiksha Raj

TL;DR
This paper evaluates biologically inspired acoustic features for robust ASR, introducing new features like FreqMask and DoGSpec, which outperform traditional features in accuracy and robustness against noise and adversarial attacks.
Contribution
It proposes two novel acoustic features, FreqMask and DoGSpec, inspired by neuro-psychological phenomena, demonstrating improved robustness over standard features.
Findings
DoGSpec outperforms LogMelSpec in robustness with minimal accuracy loss.
GammSpec improves accuracy and robustness to non-adversarial noise.
DoGSpec surpasses GammSpec against adversarial attacks.
Abstract
Automatic Speech Recognition (ASR) systems must be robust to the myriad types of noises present in real-world environments including environmental noise, room impulse response, special effects as well as attacks by malicious actors (adversarial attacks). Recent works seek to improve accuracy and robustness by developing novel Deep Neural Networks (DNNs) and curating diverse training datasets for them, while using relatively simple acoustic features. While this approach improves robustness to the types of noise present in the training data, it confers limited robustness against unseen noises and negligible robustness to adversarial attacks. In this paper, we revisit the approach of earlier works that developed acoustic features inspired by biological auditory perception that could be used to perform accurate and robust ASR. In contrast, Specifically, we evaluate the ASR accuracy and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Underwater Acoustics Research
