SuPseudo: A Pseudo-supervised Learning Method for Neural Speech Enhancement in Far-field Speech Recognition

Longjie Luo; Lin Li; Qingyang Hong

arXiv:2505.24450·cs.SD·June 24, 2025

SuPseudo: A Pseudo-supervised Learning Method for Neural Speech Enhancement in Far-field Speech Recognition

Longjie Luo, Lin Li, Qingyang Hong

PDF

Open Access

TL;DR

SuPseudo introduces a pseudo-supervised learning approach using direct sound estimation to improve neural speech enhancement models for far-field speech recognition, significantly enhancing real-world performance.

Contribution

The paper proposes a novel pseudo-supervised learning method, SuPseudo, leveraging direct sound estimation as pseudo-labels to enhance speech enhancement in real-recorded data.

Findings

01

SuPseudo outperforms previous state-of-the-art methods.

02

The approach improves generalization to real-world far-field data.

03

Experiments on MISP2023 corpus validate effectiveness.

Abstract

Due to the lack of target speech annotations in real-recorded far-field conversational datasets, speech enhancement (SE) models are typically trained on simulated data. However, the trained models often perform poorly in real-world conditions, hindering their application in far-field speech recognition. To address the issue, we (a) propose direct sound estimation (DSE) to estimate the oracle direct sound of real-recorded data for SE; and (b) present a novel pseudo-supervised learning method, SuPseudo, which leverages DSE-estimates as pseudo-labels and enables SE models to directly learn from and adapt to real-recorded data, thereby improving their generalization capability. Furthermore, an SE model called FARNET is designed to fully utilize SuPseudo. Experiments on the MISP2023 corpus demonstrate the effectiveness of SuPseudo, and our system significantly outperforms the previous…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis