Semi-Supervised Singing Voice Separation with Noisy Self-Training

Zhepei Wang; Ritwik Giri; Umut Isik; Jean-Marc Valin; Arvindh; Krishnaswamy

arXiv:2102.07961·eess.AS·February 17, 2021·ICASSP·1 cites

Semi-Supervised Singing Voice Separation with Noisy Self-Training

Zhepei Wang, Ritwik Giri, Umut Isik, Jean-Marc Valin, Arvindh, Krishnaswamy

PDF

Open Access

TL;DR

This paper introduces a semi-supervised approach for singing voice separation that uses noisy self-training to leverage unlabeled data, improving performance over traditional supervised methods.

Contribution

It proposes a novel noisy self-training framework that effectively utilizes unlabeled data for singing voice separation, addressing data scarcity issues.

Findings

01

Self-training improves separation quality.

02

Data augmentation enhances model performance.

03

Outperforms supervised baselines.

Abstract

Recent progress in singing voice separation has primarily focused on supervised deep learning methods. However, the scarcity of ground-truth data with clean musical sources has been a problem for long. Given a limited set of labeled data, we present a method to leverage a large volume of unlabeled data to improve the model's performance. Following the noisy self-training framework, we first train a teacher network on the small labeled dataset and infer pseudo-labels from the large corpus of unlabeled mixtures. Then, a larger student network is trained on combined ground-truth and self-labeled datasets. Empirical results show that the proposed self-training scheme, along with data augmentation methods, effectively leverage the large unlabeled corpus and obtain superior performance compared to supervised methods.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Music and Audio Processing · Speech Recognition and Synthesis