SuPseudo: A Pseudo-supervised Learning Method for Neural Speech Enhancement in Far-field Speech Recognition
Longjie Luo, Lin Li, Qingyang Hong

TL;DR
SuPseudo introduces a pseudo-supervised learning approach using direct sound estimation to improve neural speech enhancement models for far-field speech recognition, significantly enhancing real-world performance.
Contribution
The paper proposes a novel pseudo-supervised learning method, SuPseudo, leveraging direct sound estimation as pseudo-labels to enhance speech enhancement in real-recorded data.
Findings
SuPseudo outperforms previous state-of-the-art methods.
The approach improves generalization to real-world far-field data.
Experiments on MISP2023 corpus validate effectiveness.
Abstract
Due to the lack of target speech annotations in real-recorded far-field conversational datasets, speech enhancement (SE) models are typically trained on simulated data. However, the trained models often perform poorly in real-world conditions, hindering their application in far-field speech recognition. To address the issue, we (a) propose direct sound estimation (DSE) to estimate the oracle direct sound of real-recorded data for SE; and (b) present a novel pseudo-supervised learning method, SuPseudo, which leverages DSE-estimates as pseudo-labels and enables SE models to directly learn from and adapt to real-recorded data, thereby improving their generalization capability. Furthermore, an SE model called FARNET is designed to fully utilize SuPseudo. Experiments on the MISP2023 corpus demonstrate the effectiveness of SuPseudo, and our system significantly outperforms the previous…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis
