Enhancing Automatic Speech Recognition Through Integrated Noise Detection Architecture

Karamvir Singh

arXiv:2512.08973·cs.SD·December 11, 2025

Enhancing Automatic Speech Recognition Through Integrated Noise Detection Architecture

Karamvir Singh

PDF

Open Access

TL;DR

This paper introduces an integrated noise detection module within a speech recognition system based on wav2vec2, significantly improving transcription accuracy and noise discrimination in challenging acoustic environments.

Contribution

It presents a novel architecture that combines noise detection with speech recognition, enhancing robustness in noisy conditions.

Findings

01

Improved word error rate and character error rate

02

Enhanced noise detection accuracy

03

Better performance in challenging acoustic environments

Abstract

This research presents a novel approach to enhancing automatic speech recognition systems by integrating noise detection capabilities directly into the recognition architecture. Building upon the wav2vec2 framework, the proposed method incorporates a dedicated noise identification module that operates concurrently with speech transcription. Experimental validation using publicly available speech and environmental audio datasets demonstrates substantial improvements in transcription quality and noise discrimination. The enhanced system achieves superior performance in word error rate, character error rate, and noise detection accuracy compared to conventional architectures. Results indicate that joint optimization of transcription and noise classification objectives yields more reliable speech recognition in challenging acoustic conditions.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing