Neural Vocoder Feature Estimation for Dry Singing Voice Separation

Jaekwon Im; Soonbeom Choi; Sangeon Yong; Juhan Nam

arXiv:2211.15948·cs.SD·November 30, 2022

Neural Vocoder Feature Estimation for Dry Singing Voice Separation

Jaekwon Im, Soonbeom Choi, Sangeon Yong, Juhan Nam

PDF

Open Access

TL;DR

This paper introduces a novel singing voice separation method that predicts dry vocal mel-spectrograms using neural vocoder features, improving separation quality over existing models by focusing on dereverberation and reusability.

Contribution

It proposes predicting dry singing voice mel-spectrograms with neural vocoder features and incorporates a singing voice detector, advancing separation techniques beyond spectrogram masking.

Findings

01

Outperforms state-of-the-art models in objective metrics

02

Achieves better dereverberation and separation quality

03

Improves reusability of isolated singing voices

Abstract

Singing voice separation (SVS) is a task that separates singing voice audio from its mixture with instrumental audio. Previous SVS studies have mainly employed the spectrogram masking method which requires a large dimensionality in predicting the binary masks. In addition, they focused on extracting a vocal stem that retains the wet sound with the reverberation effect. This result may hinder the reusability of the isolated singing voice. This paper addresses the issues by predicting mel-spectrogram of dry singing voices from the mixed audio as neural vocoder features and synthesizing the singing voice waveforms from the neural vocoder. We experimented with two separation methods. One is predicting binary masks in the mel-spectrogram domain and the other is directly predicting the mel-spectrogram. Furthermore, we add a singing voice detector to identify the singing voice segments over…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing