Self-Supervised Learning-Based Source Separation for Meeting Data

Yuang Li; Xianrui Zheng; Philip C. Woodland

arXiv:2304.00871·eess.AS·April 4, 2023·ICASSP·1 cites

Self-Supervised Learning-Based Source Separation for Meeting Data

Yuang Li, Xianrui Zheng, Philip C. Woodland

PDF

Open Access

TL;DR

This paper evaluates self-supervised learning models for source separation in meeting scenarios, proposing a novel integration method with ASR and demonstrating improved transcription accuracy on real-world data.

Contribution

It compares seven SSL models on real and simulated data, introduces an iterative source selection method, and adapts training techniques for better real-world performance.

Findings

01

Improved cpWER-us by 1.9% on AMI dev set

02

Improved cpWER-us by 1.5% on AMI test set

03

Demonstrated effectiveness of the proposed source separation approach

Abstract

Source separation can improve automatic speech recognition (ASR) under multi-party meeting scenarios by extracting single-speaker signals from overlapped speech. Despite the success of self-supervised learning models in single-channel source separation, most studies have focused on simulated setups. In this paper, seven SSL models were compared on both simulated and real-world corpora. Then, we propose to integrate the best-performing model WavLM into an automatic transcription system through a novel iterative source selection method. To improve real-world performance, time-domain unsupervised mixture invariant training was adapted to the time-frequency domain. Experiments showed that in the transcription system when source separation was inserted before an ASR model fine-tuned on separated speech, absolute reductions of 1.9% and 1.5% in concatenated minimum-permutation word error rate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing

MethodsTest