Scaling Up Music Information Retrieval Training with Semi-Supervised   Learning

Yun-Ning Hung; Ju-Chiang Wang; Minz Won; Duc Le

arXiv:2310.01353·eess.AS·October 3, 2023

Scaling Up Music Information Retrieval Training with Semi-Supervised Learning

Yun-Ning Hung, Ju-Chiang Wang, Minz Won, Duc Le

PDF

Open Access

TL;DR

This paper demonstrates that scaling up both model size and unlabeled training data using semi-supervised learning significantly improves performance across multiple Music Information Retrieval tasks, achieving state-of-the-art results.

Contribution

It is the first to systematically study the combined effects of large-scale data and model size in semi-supervised MIR training.

Findings

01

Scaling data to 240k hours enhances model performance.

02

Increasing model size from 3M to 100M parameters improves results.

03

Large-scale semi-supervised training outperforms supervised and self-supervised methods.

Abstract

In the era of data-driven Music Information Retrieval (MIR), the scarcity of labeled data has been one of the major concerns to the success of an MIR task. In this work, we leverage the semi-supervised teacher-student training approach to improve MIR tasks. For training, we scale up the unlabeled music data to 240k hours, which is much larger than any public MIR datasets. We iteratively create and refine the pseudo-labels in the noisy teacher-student training process. Knowledge expansion is also explored to iteratively scale up the model sizes from as small as less than 3M to almost 100M parameters. We study the performance correlation between data size and model size in the experiments. By scaling up both model size and training data, our models achieve state-of-the-art results on several MIR tasks compared to models that are either trained in a supervised manner or based on a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech Recognition and Synthesis · Speech and Audio Processing