SVSGAN: Singing Voice Separation via Generative Adversarial Network

Zhe-Cheng Fan; Yen-Lin Lai; Jyh-Shing Roger Jang

arXiv:1710.11428·cs.SD·November 15, 2017·5 cites

SVSGAN: Singing Voice Separation via Generative Adversarial Network

Zhe-Cheng Fan, Yen-Lin Lai, Jyh-Shing Roger Jang

PDF

Open Access

TL;DR

This paper introduces SVSGAN, a novel GAN-based framework for singing voice separation that improves performance by combining supervised initialization with unsupervised adversarial training on multiple datasets.

Contribution

The paper proposes a new GAN-based approach for singing voice separation that leverages distribution matching and combines supervised and unsupervised training phases.

Findings

01

Improved separation performance on MIR-1K, iKala, and DSD100 datasets.

02

Effective use of GAN for audio source separation.

03

Outperforms some existing deep learning methods.

Abstract

Separating two sources from an audio mixture is an important task with many applications. It is a challenging problem since only one signal channel is available for analysis. In this paper, we propose a novel framework for singing voice separation using the generative adversarial network (GAN) with a time-frequency masking function. The mixture spectra is considered to be a distribution and is mapped to the clean spectra which is also considered a distribtution. The approximation of distributions between mixture spectra and clean spectra is performed during the adversarial training process. In contrast with current deep learning approaches for source separation, the parameters of the proposed framework are first initialized in a supervised setting and then optimized by the training procedure of GAN in an unsupervised setting. Experimental results on three datasets (MIR-1K, iKala and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Music and Audio Processing · Speech Recognition and Synthesis

MethodsConvolution · Dogecoin Customer Service Number +1-833-534-1729