Enhancing Automatic Speech Recognition Through Integrated Noise Detection Architecture
Karamvir Singh

TL;DR
This paper introduces an integrated noise detection module within a speech recognition system based on wav2vec2, significantly improving transcription accuracy and noise discrimination in challenging acoustic environments.
Contribution
It presents a novel architecture that combines noise detection with speech recognition, enhancing robustness in noisy conditions.
Findings
Improved word error rate and character error rate
Enhanced noise detection accuracy
Better performance in challenging acoustic environments
Abstract
This research presents a novel approach to enhancing automatic speech recognition systems by integrating noise detection capabilities directly into the recognition architecture. Building upon the wav2vec2 framework, the proposed method incorporates a dedicated noise identification module that operates concurrently with speech transcription. Experimental validation using publicly available speech and environmental audio datasets demonstrates substantial improvements in transcription quality and noise discrimination. The enhanced system achieves superior performance in word error rate, character error rate, and noise detection accuracy compared to conventional architectures. Results indicate that joint optimization of transcription and noise classification objectives yields more reliable speech recognition in challenging acoustic conditions.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing
