Mel-McNet: A Mel-Scale Framework for Online Multichannel Speech Enhancement

Yujie Yang; Bing Yang; Xiaofei Li

arXiv:2505.19576·eess.AS·May 27, 2025

Mel-McNet: A Mel-Scale Framework for Online Multichannel Speech Enhancement

Yujie Yang, Bing Yang, Xiaofei Li

PDF

Open Access 1 Repo

TL;DR

This paper introduces Mel-McNet, a novel Mel-scale framework for online multichannel speech enhancement that reduces computational complexity significantly while maintaining high performance, aligning better with human auditory perception.

Contribution

The work presents a new Mel-scale processing framework with a specialized STFT-to-Mel module and a modified McNet backbone, achieving efficient and effective speech enhancement.

Findings

01

Reduces computational complexity by 60%

02

Maintains comparable enhancement and ASR performance

03

Outperforms other state-of-the-art methods

Abstract

Online multichannel speech enhancement has been intensively studied recently. Though Mel-scale frequency is more matched with human auditory perception and computationally efficient than linear frequency, few works are implemented in a Mel-frequency domain. To this end, this work proposes a Mel-scale framework (namely Mel-McNet). It processes spectral and spatial information with two key components: an effective STFT-to-Mel module compressing multi-channel STFT features into Mel-frequency representations, and a modified McNet backbone directly operating in the Mel domain to generate enhanced LogMel spectra. The spectra can be directly fed to vocoders for waveform reconstruction or ASR systems for transcription. Experiments on CHiME-3 show that Mel-McNet can reduce computational complexity by 60% while maintaining comparable enhancement and ASR performance to the original McNet.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

audio-westlakeu/mel-mcnet
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis