Mixture to Beamformed Mixture: Leveraging Beamformed Mixture as Weak-Supervision for Speech Enhancement and Noise-Robust ASR

Zhong-Qiu Wang; Ruizhe Pang

arXiv:2507.15229·eess.AS·July 22, 2025

Mixture to Beamformed Mixture: Leveraging Beamformed Mixture as Weak-Supervision for Speech Enhancement and Noise-Robust ASR

Zhong-Qiu Wang, Ruizhe Pang

PDF

TL;DR

This paper introduces a novel training approach for speech enhancement and noise-robust ASR that uses beamformed mixtures as weak supervision, improving real-world performance by leveraging higher SNR signals.

Contribution

The paper proposes using beamformed mixtures as weak supervision to train neural networks, enhancing generalization to real-world noisy speech compared to traditional simulated training.

Findings

01

Improved speech enhancement on real-recorded datasets

02

Better noise robustness in ASR systems

03

Effective training with real-recorded mixture and beamformed pairs

Abstract

In multi-channel speech enhancement and robust automatic speech recognition (ASR), beamforming can typically improve the signal-to-noise ratio (SNR) of the target speaker and produce reliable enhancement with little distortion to target speech. With this observation, we propose to leverage beamformed mixture, which has a higher SNR of the target speaker than the input mixture, as a weak supervision to train deep neural networks (DNNs) to enhance the input mixture. This way, we can train enhancement models using pairs of real-recorded mixture and its beamformed mixture, and potentially realize better generalization to real mixtures, compared with only training the models on simulated mixtures, which usually mismatch real mixtures. Evaluation results on the real-recorded CHiME-4 dataset show the effectiveness of the proposed algorithm.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.