Stabilizing Label Assignment for Speech Separation by Self-supervised   Pre-training

Sung-Feng Huang; Shun-Po Chuang; Da-Rong Liu; Yi-Chen Chen; Gene-Ping; Yang; Hung-yi Lee

arXiv:2010.15366·cs.SD·August 24, 2021

Stabilizing Label Assignment for Speech Separation by Self-supervised Pre-training

Sung-Feng Huang, Shun-Po Chuang, Da-Rong Liu, Yi-Chen Chen, Gene-Ping, Yang, Hung-yi Lee

PDF

Open Access 1 Repo

TL;DR

This paper introduces a self-supervised pre-training method to stabilize label assignment in speech separation models, improving convergence speed and performance.

Contribution

It proposes a novel self-supervised pre-training approach specifically designed to stabilize label assignment in speech separation training.

Findings

01

Self-supervised pre-training significantly improves speech separation performance.

02

The method stabilizes label assignment, leading to faster convergence.

03

Results are consistent across different models and datasets.

Abstract

Speech separation has been well developed, with the very successful permutation invariant training (PIT) approach, although the frequent label assignment switching happening during PIT training remains to be a problem when better convergence speed and achievable performance are desired. In this paper, we propose to perform self-supervised pre-training to stabilize the label assignment in training the speech separation model. Experiments over several types of self-supervised approaches, several typical speech separation models and two different datasets showed that very good improvements are achievable if a proper self-supervised approach is chosen.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

SungFeng-Huang/SSL-pretraining-separation
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing