On the Role of Visual Context in Enriching Music Representations

Kleanthis Avramidis; Shanti Stewart; Shrikanth Narayanan

arXiv:2210.15828·cs.SD·October 31, 2022

On the Role of Visual Context in Enriching Music Representations

Kleanthis Avramidis, Shanti Stewart, Shrikanth Narayanan

PDF

Open Access 1 Repo

TL;DR

This paper introduces VCMR, a contrastive learning framework that leverages visual context from music videos to improve music representations, enhancing robustness and interpretability for music tagging tasks.

Contribution

The study presents a novel multimodal contrastive learning approach that incorporates visual context from music videos to enrich audio-based music representations.

Findings

01

Visual context improves music tagging performance.

02

Music representations become more robust with visual information.

03

The framework reveals how visual context influences musical elements.

Abstract

Human perception and experience of music is highly context-dependent. Contextual variability contributes to differences in how we interpret and interact with music, challenging the design of robust models for information retrieval. Incorporating multimodal context from diverse sources provides a promising approach toward modeling this variability. Music presented in media such as movies and music videos provide rich multimodal context that modulates underlying human experiences. However, such context modeling is underexplored, as it requires large amounts of multimodal data along with relevant annotations. Self-supervised learning can help address these challenges by automatically extracting rich, high-level correspondences between different modalities, hence alleviating the need for fine-grained annotations at scale. In this study, we propose VCMR -- Video-Conditioned Music…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

klean2050/vcmr
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech and Audio Processing · Music Technology and Sound Studies