A Comparative Evaluation of Deep Learning Models for Speech Enhancement in Real-World Noisy Environments

Md Jahangir Alam Khondkar; Ajan Ahmed; Stephanie Schuckers; and Masudul Haider Imtiaz

arXiv:2506.15000·cs.SD·January 22, 2026

A Comparative Evaluation of Deep Learning Models for Speech Enhancement in Real-World Noisy Environments

Md Jahangir Alam Khondkar, Ajan Ahmed, Stephanie Schuckers, and Masudul Haider Imtiaz

PDF

Open Access

TL;DR

This study benchmarks three deep learning models for speech enhancement in noisy environments, evaluating their noise suppression, perceptual quality, and speaker feature retention across multiple datasets.

Contribution

It provides a comprehensive comparative analysis of Wave-U-Net, CMGAN, and U-Net, highlighting their strengths and trade-offs in real-world speech enhancement tasks.

Findings

01

U-Net achieves highest noise suppression with significant SNR improvements.

02

CMGAN attains the best perceptual quality with top PESQ scores.

03

Wave-U-Net balances noise suppression and speaker feature retention.

Abstract

Speech enhancement, particularly denoising, is vital in improving the intelligibility and quality of speech signals for real-world applications, especially in noisy environments. While prior research has introduced various deep learning models for this purpose, many struggle to balance noise suppression, perceptual quality, and speaker-specific feature preservation, leaving a critical research gap in their comparative performance evaluation. This study benchmarks three state-of-the-art models Wave-U-Net, CMGAN, and U-Net, on diverse datasets such as SpEAR, VPQAD, and Clarkson datasets. These models were chosen due to their relevance in the literature and code accessibility. The evaluation reveals that U-Net achieves high noise suppression with SNR improvements of +71.96% on SpEAR, +64.83% on VPQAD, and +364.2% on the Clarkson dataset. CMGAN outperforms in perceptual quality, attaining…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Indoor and Outdoor Localization Technologies