Content based singing voice source separation via strong conditioning   using aligned phonemes

Gabriel Meseguer-Brocal; Geoffroy Peeters

arXiv:2008.02070·eess.AS·August 6, 2020·6 cites

Content based singing voice source separation via strong conditioning using aligned phonemes

Gabriel Meseguer-Brocal, Geoffroy Peeters

PDF

Open Access 1 Repo

TL;DR

This paper introduces a new dataset with time-aligned phonemes and a neural network model that uses strong phoneme conditioning to improve singing voice source separation.

Contribution

It provides the first aligned phoneme dataset for singing voice separation and demonstrates the effectiveness of strong phoneme conditioning in a U-Net model.

Findings

01

Phoneme conditioning improves separation quality.

02

Aligned phoneme data enhances model performance.

03

Strong conditioning outperforms weak conditioning methods.

Abstract

Informed source separation has recently gained renewed interest with the introduction of neural networks and the availability of large multitrack datasets containing both the mixture and the separated sources. These approaches use prior information about the target source to improve separation. Historically, Music Information Retrieval researchers have focused primarily on score-informed source separation, but more recent approaches explore lyrics-informed source separation. However, because of the lack of multitrack datasets with time-aligned lyrics, models use weak conditioning with non-aligned lyrics. In this paper, we present a multimodal multitrack dataset with lyrics aligned in time at the word level with phonetic information as well as explore strong conditioning using the aligned phonemes. Our model follows a U-Net architecture and takes as input both the magnitude spectrogram…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

gabolsgabs/vunet
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Music and Audio Processing · Speech Recognition and Synthesis

MethodsConcatenated Skip Connection · *Communicated@Fast*How Do I Communicate to Expedia? · Max Pooling · Convolution · U-Net