Song Data Cleansing for End-to-End Neural Singer Diarization Using   Neural Analysis and Synthesis Framework

Hokuto Munakata; Ryo Terashima; Yusuke Fujita

arXiv:2406.16315·eess.AS·June 25, 2024

Song Data Cleansing for End-to-End Neural Singer Diarization Using Neural Analysis and Synthesis Framework

Hokuto Munakata, Ryo Terashima, Yusuke Fujita

PDF

Open Access

TL;DR

This paper introduces a neural data cleansing method using NANSY++ to improve end-to-end neural singer diarization by converting choral singing into solo singing data, significantly reducing diarization errors.

Contribution

The novel approach leverages NANSY++ for data cleansing, enabling effective training of singer diarization models on choral-rich datasets, which was challenging before.

Findings

01

Achieved a 14.8 point reduction in diarization error rate.

02

Effectively converts choral singing to solo singing data for training.

03

Improves diarization performance on popular duet songs.

Abstract

We propose a data cleansing method that utilizes a neural analysis and synthesis (NANSY++) framework to train an end-to-end neural diarization model (EEND) for singer diarization. Our proposed model converts song data with choral singing which is commonly contained in popular music and unsuitable for generating a simulated dataset to the solo singing data. This cleansing is based on NANSY++, which is a framework trained to reconstruct an input non-overlapped audio signal. We exploit the pre-trained NANSY++ to convert choral singing into clean, non-overlapped audio. This cleansing process mitigates the mislabeling of choral singing to solo singing and helps the effective training of EEND models even when the majority of available song data contains choral singing sections. We experimentally evaluated the EEND model trained with a dataset using our proposed method using annotated popular…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing

MethodsEnd-to-End Neural Diarization