When De-noising Hurts: A Systematic Study of Speech Enhancement Effects on Modern Medical ASR Systems

Sujal Chondhekar; Vasanth Murukuri; Rushabh Vasani; Sanika Goyal; Rajshree Badami; Anushree Rana; Sanjana SN; Karthik Pandia; Sulabh Katiyar; Neha Jagadeesh; Sankalp Gulati

arXiv:2512.17562·cs.SD·December 22, 2025

When De-noising Hurts: A Systematic Study of Speech Enhancement Effects on Modern Medical ASR Systems

Sujal Chondhekar, Vasanth Murukuri, Rushabh Vasani, Sanika Goyal, Rajshree Badami, Anushree Rana, Sanjana SN, Karthik Pandia, Sulabh Katiyar, Neha Jagadeesh, Sankalp Gulati

PDF

Open Access

TL;DR

This study systematically evaluates the impact of speech enhancement on modern medical ASR systems and finds that denoising often degrades recognition performance, challenging conventional assumptions about noise reduction benefits.

Contribution

It provides the first comprehensive analysis showing that traditional speech enhancement can harm modern large-scale ASR models in medical settings.

Findings

01

Enhanced audio increases semWER in all tested configurations.

02

Modern ASR models are inherently robust to noise without preprocessing.

03

Speech enhancement may remove features critical for accurate recognition.

Abstract

Speech enhancement methods are commonly believed to improve the performance of automatic speech recognition (ASR) in noisy environments. However, the effectiveness of these techniques cannot be taken for granted in the case of modern large-scale ASR models trained on diverse, noisy data. We present a systematic evaluation of MetricGAN-plus-voicebank denoising on four state-of-the-art ASR systems: OpenAI Whisper, NVIDIA Parakeet, Google Gemini Flash 2.0, Parrotlet-a using 500 medical speech recordings under nine noise conditions. ASR performance is measured using semantic WER (semWER), a normalized word error rate (WER) metric accounting for domain-specific normalizations. Our results reveal a counterintuitive finding: speech enhancement preprocessing degrades ASR performance across all noise conditions and models. Original noisy audio achieves lower semWER than enhanced audio in all 40…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Voice and Speech Disorders · Speech and Audio Processing