DALI: a large Dataset of synchronized Audio, LyrIcs and notes,   automatically created using teacher-student machine learning paradigm

Gabriel Meseguer-Brocal; Alice Cohen-Hadria; Geoffroy Peeters

arXiv:1906.10606·eess.AS·June 26, 2019·47 cites

DALI: a large Dataset of synchronized Audio, LyrIcs and notes, automatically created using teacher-student machine learning paradigm

Gabriel Meseguer-Brocal, Alice Cohen-Hadria, Geoffroy Peeters

PDF

Open Access 2 Repos

TL;DR

This paper introduces DALI, a large multimodal dataset of audio, lyrics, and notes, created through an iterative teacher-student machine learning process that improves alignment accuracy.

Contribution

We present DALI, a novel large-scale dataset with synchronized multimodal annotations, and demonstrate an iterative machine learning method for automatic dataset creation and alignment.

Findings

01

DALI contains 5358 audio tracks with aligned lyrics and notes.

02

The teacher-student learning paradigm improves alignment accuracy over iterations.

03

The methodology enables scalable, automatic dataset annotation with minimal manual effort.

Abstract

The goal of this paper is twofold. First, we introduce DALI, a large and rich multimodal dataset containing 5358 audio tracks with their time-aligned vocal melody notes and lyrics at four levels of granularity. The second goal is to explain our methodology where dataset creation and learning models interact using a teacher-student machine learning paradigm that benefits each other. We start with a set of manual annotations of draft time-aligned lyrics and notes made by non-expert users of Karaoke games. This set comes without audio. Therefore, we need to find the corresponding audio and adapt the annotations to it. To that end, we retrieve audio candidates from the Web. Each candidate is then turned into a singing-voice probability over time using a teacher, a deep convolutional neural network singing-voice detection system (SVD), trained on cleaned data. Comparing the time-aligned…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech and Audio Processing · Music Technology and Sound Studies