Towards Proper Contrastive Self-supervised Learning Strategies For Music Audio Representation
Jeong Choi, Seongwon Jang, Hyunsouk Cho, Sehee Chung

TL;DR
This paper investigates contrastive self-supervised learning strategies for music audio representation and evaluates their effectiveness across various music information retrieval tasks, providing insights into optimal strategies for different MIR applications.
Contribution
It empirically compares different contrastive self-supervised learning schemes for music audio and analyzes their suitability for various MIR tasks, guiding future strategy selection.
Findings
Different contrastive strategies capture distinct musical features.
Some strategies are more effective for specific MIR tasks.
Music representations encode comprehensive auditory information.
Abstract
The common research goal of self-supervised learning is to extract a general representation which an arbitrary downstream task would benefit from. In this work, we investigate music audio representation learned from different contrastive self-supervised learning schemes and empirically evaluate the embedded vectors on various music information retrieval (MIR) tasks where different levels of the music perception are concerned. We analyze the results to discuss the proper direction of contrastive learning strategies for different MIR tasks. We show that these representations convey a comprehensive information about the auditory characteristics of music in general, although each of the self-supervision strategies has its own effectiveness in certain aspect of information.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech and Audio Processing · Music Technology and Sound Studies
MethodsContrastive Learning
