On the Effectiveness of Speech Self-supervised Learning for Music
Yinghao Ma, Ruibin Yuan, Yizhi Li, Ge Zhang, Xingran Chen, Hanzhi Yin,, Chenghua Lin, Emmanouil Benetos, Anton Ragni, Norbert Gyenge, Ruibo Liu, Gus, Xia, Roger Dannenberg, Yike Guo, Jie Fu

TL;DR
This paper investigates the adaptation of speech self-supervised learning models to music information retrieval, demonstrating that music data training improves MIR performance but also highlighting limitations in modeling polyphonic music.
Contribution
It introduces music2vec and musicHuBERT, adapting speech SSL models for music, and systematically evaluates their effectiveness across multiple MIR tasks.
Findings
Training with music data enhances MIR task performance.
Speech SSL models have limitations in modeling polyphonic music.
Empirical guidelines for future musical SSL design are proposed.
Abstract
Self-supervised learning (SSL) has shown promising results in various speech and natural language processing applications. However, its efficacy in music information retrieval (MIR) still remains largely unexplored. While previous SSL models pre-trained on music recordings may have been mostly closed-sourced, recent speech models such as wav2vec2.0 have shown promise in music modelling. Nevertheless, research exploring the effectiveness of applying speech SSL models to music recordings has been limited. We explore the music adaption of SSL with two distinctive speech-related models, data2vec1.0 and Hubert, and refer to them as music2vec and musicHuBERT, respectively. We train SSL models with 95M parameters under various pre-training configurations and systematically evaluate the MIR task performances with 13 different MIR tasks. Our findings suggest that training with music data…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech Recognition and Synthesis · Music Technology and Sound Studies
