Speech Separation with Pretrained Frontend to Minimize Domain Mismatch

Wupeng Wang; Zexu Pan; Xinke Li; Shuai Wang; Haizhou Li

arXiv:2411.03085·cs.SD·November 6, 2024

Speech Separation with Pretrained Frontend to Minimize Domain Mismatch

Wupeng Wang, Zexu Pan, Xinke Li, Shuai Wang, Haizhou Li

PDF

1 Repo

TL;DR

This paper introduces a self-supervised, domain-invariant pretrained frontend for speech separation that reduces the domain gap between synthetic training data and real-world applications, improving separation quality.

Contribution

A novel DIP frontend with mixture predictive and invariant coding tasks that captures shared cues, enabling better transfer of speech separation skills from synthetic to real data.

Findings

01

DIP frontend outperforms existing models on standard benchmarks.

02

Pretraining improves speech separation quality in real-world scenarios.

03

The approach effectively reduces domain mismatch in speech separation.

Abstract

Speech separation seeks to separate individual speech signals from a speech mixture. Typically, most separation models are trained on synthetic data due to the unavailability of target reference in real-world cocktail party scenarios. As a result, there exists a domain gap between real and synthetic data when deploying speech separation models in real-world applications. In this paper, we propose a self-supervised domain-invariant pretrained (DIP) frontend that is exposed to mixture data without the need for target reference speech. The DIP frontend utilizes a Siamese network with two innovative pretext tasks, mixture predictive coding (MPC) and mixture invariant coding (MIC), to capture shared contextual cues between real and synthetic unlabeled mixtures. Subsequently, we freeze the DIP frontend as a feature extractor when training the downstream speech separation models on synthetic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Wufan0Willan/DIP
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSiamese Network · ALIGN