CleanMel: Mel-Spectrogram Enhancement for Improving Both Speech Quality and ASR

Nian Shao; Rui Zhou; Pengyu Wang; Xian Li; Ying Fang; Yujie Yang; Xiaofei Li

arXiv:2502.20040·eess.AS·July 31, 2025

CleanMel: Mel-Spectrogram Enhancement for Improving Both Speech Quality and ASR

Nian Shao, Rui Zhou, Pengyu Wang, Xian Li, Ying Fang, Yujie Yang, Xiaofei Li

PDF

Open Access 1 Repo 1 Models

TL;DR

CleanMel is a novel neural network that enhances Mel-spectrograms to improve speech quality and automatic speech recognition, leveraging cross-band and narrow-band processing in the Mel domain.

Contribution

It introduces a single-channel Mel-spectrogram enhancement network with interleaved processing, improving both speech quality and ASR performance over traditional methods.

Findings

01

Significant improvements in speech quality metrics.

02

Enhanced ASR accuracy across multiple datasets.

03

Effective in both denoising and dereverberation tasks.

Abstract

In this work, we propose CleanMel, a single-channel Mel-spectrogram denoising and dereverberation network for improving both speech quality and automatic speech recognition (ASR) performance. The proposed network takes as input the noisy and reverberant microphone recording and predicts the corresponding clean Mel-spectrogram. The enhanced Mel-spectrogram can be either transformed to the speech waveform with a neural vocoder or directly used for ASR. The proposed network is composed of interleaved cross-band and narrow-band processing in the Mel-frequency domain, for learning the full-band spectral pattern and the narrow-band properties of signals, respectively. Compared to linear-frequency domain or time-domain speech enhancement, the key advantage of Mel-spectrogram enhancement is that Mel-frequency presents speech in a more compact way and thus is easier to learn, which will benefit…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Audio-WestlakeU/CleanMel
pytorchOfficial

Models

🤗
WestlakeAudioLab/CleanMel
model

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Emotion and Mood Recognition