Towards Contrastive Learning in Music Video Domain
Karel Veldkamp, Mariya Hendriksen, Zolt\'an Szl\'avik, Alexander, Keijser

TL;DR
This paper explores the application of contrastive learning to music video domains, finding it less effective than pre-trained networks and analyzing reasons for its limited success.
Contribution
It introduces a dual encoder contrastive learning approach for music videos and provides insights into its challenges and future directions.
Findings
Pre-trained networks outperform contrastive learning on music tagging and genre classification.
Contrastive learning struggles to effectively unify audio and video embeddings in music videos.
Qualitative analysis reveals difficulties in aligning multimodal representations.
Abstract
Contrastive learning is a powerful way of learning multimodal representations across various domains such as image-caption retrieval and audio-visual representation learning. In this work, we investigate if these findings generalize to the domain of music videos. Specifically, we create a dual en-coder for the audio and video modalities and train it using a bidirectional contrastive loss. For the experiments, we use an industry dataset containing 550 000 music videos as well as the public Million Song Dataset, and evaluate the quality of learned representations on the downstream tasks of music tagging and genre classification. Our results indicate that pre-trained networks without contrastive fine-tuning outperform our contrastive learning approach when evaluated on both tasks. To gain a better understanding of the reasons contrastive learning was not successful for music videos, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech and Audio Processing · Diverse Musicological Studies
MethodsContrastive Learning
