Transfer Learning for Improving Singing-voice Detection in Polyphonic   Instrumental Music

Yuanbo Hou; Frank K. Soong; Jian Luan; Shengchen Li

arXiv:2008.04658·eess.AS·August 12, 2020

Transfer Learning for Improving Singing-voice Detection in Polyphonic Instrumental Music

Yuanbo Hou, Frank K. Soong, Jian Luan, Shengchen Li

PDF

Open Access 1 Repo

TL;DR

This paper introduces a transfer learning-based data augmentation method to improve singing-voice detection in polyphonic music, addressing the scarcity of labeled data and reducing domain mismatch.

Contribution

It proposes a novel transfer learning approach that enhances singing-voice detection accuracy by leveraging artificial data and small real datasets.

Findings

01

F-score improved from 89.5% to 93.2%.

02

Artificial data combined with transfer learning enhances detection accuracy.

03

Method reduces the need for extensive frame-level labeling.

Abstract

Detecting singing-voice in polyphonic instrumental music is critical to music information retrieval. To train a robust vocal detector, a large dataset marked with vocal or non-vocal label at frame-level is essential. However, frame-level labeling is time-consuming and labor expensive, resulting there is little well-labeled dataset available for singing-voice detection (S-VD). Hence, we propose a data augmentation method for S-VD by transfer learning. In this study, clean speech clips with voice activity endpoints and separate instrumental music clips are artificially added together to simulate polyphonic vocals to train a vocal/non-vocal detector. Due to the different articulation and phonation between speaking and singing, the vocal detector trained with the artificial dataset does not match well with the polyphonic music which is singing vocals together with the instrumental…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

moses1994/singing-voice-detection
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech and Audio Processing · Music Technology and Sound Studies