DALI: a large Dataset of synchronized Audio, LyrIcs and notes, automatically created using teacher-student machine learning paradigm
Gabriel Meseguer-Brocal, Alice Cohen-Hadria, Geoffroy Peeters

TL;DR
This paper introduces DALI, a large multimodal dataset of audio, lyrics, and notes, created through an iterative teacher-student machine learning process that improves alignment accuracy.
Contribution
We present DALI, a novel large-scale dataset with synchronized multimodal annotations, and demonstrate an iterative machine learning method for automatic dataset creation and alignment.
Findings
DALI contains 5358 audio tracks with aligned lyrics and notes.
The teacher-student learning paradigm improves alignment accuracy over iterations.
The methodology enables scalable, automatic dataset annotation with minimal manual effort.
Abstract
The goal of this paper is twofold. First, we introduce DALI, a large and rich multimodal dataset containing 5358 audio tracks with their time-aligned vocal melody notes and lyrics at four levels of granularity. The second goal is to explain our methodology where dataset creation and learning models interact using a teacher-student machine learning paradigm that benefits each other. We start with a set of manual annotations of draft time-aligned lyrics and notes made by non-expert users of Karaoke games. This set comes without audio. Therefore, we need to find the corresponding audio and adapt the annotations to it. To that end, we retrieve audio candidates from the Web. Each candidate is then turned into a singing-voice probability over time using a teacher, a deep convolutional neural network singing-voice detection system (SVD), trained on cleaned data. Comparing the time-aligned…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech and Audio Processing · Music Technology and Sound Studies
