Pseudo Labels-based Neural Speech Enhancement for the AVSR Task in the MISP-Meeting Challenge

Longjie Luo; Shenghui Lu; Lin Li; Qingyang Hong

arXiv:2505.24446·cs.SD·June 24, 2025

Pseudo Labels-based Neural Speech Enhancement for the AVSR Task in the MISP-Meeting Challenge

Longjie Luo, Shenghui Lu, Lin Li, Qingyang Hong

PDF

TL;DR

This paper introduces a novel speech enhancement system using pseudo labels and multimodal data to improve automatic speech recognition in noisy, reverberant meeting recordings, achieving significant CER reductions.

Contribution

The authors propose G-SpatialNet and TLS frameworks for effective speech enhancement and pseudo label generation, advancing meeting speech recognition performance.

Findings

01

Achieved CER of 5.44% on Dev set and 9.52% on Eval set.

02

Secured second place in the MISP-Meeting Challenge.

03

Provided a 64.8% and 52.6% relative improvement over baseline.

Abstract

This paper presents our system for the MISP-Meeting Challenge Track 2. The primary difficulty lies in the dataset, which contains strong background noise, reverberation, overlapping speech, and diverse meeting topics. To address these issues, we (a) designed G-SpatialNet, a speech enhancement (SE) model to improve Guided Source Separation (GSS) signals; (b) proposed TLS, a framework comprising time alignment, level alignment, and signal-to-noise ratio filtering, to generate signal-level pseudo labels for real-recorded far-field audio data, thereby facilitating SE models' training; and (c) explored fine-tuning strategies, data augmentation, and multimodal information to enhance the performance of pre-trained Automatic Speech Recognition (ASR) models in meeting scenarios. Finally, our system achieved character error rates (CERs) of 5.44% and 9.52% on the Dev and Eval sets, respectively,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.