Emotion-Aligned Contrastive Learning Between Images and Music

Shanti Stewart; Kleanthis Avramidis; Tiantian Feng; Shrikanth; Narayanan

arXiv:2308.12610·cs.MM·December 10, 2024

Emotion-Aligned Contrastive Learning Between Images and Music

Shanti Stewart, Kleanthis Avramidis, Tiantian Feng, Shrikanth, Narayanan

PDF

Open Access 1 Repo

TL;DR

This paper introduces an emotion-aligned contrastive learning method to retrieve music based on image queries by creating a joint embedding space that captures affective qualities, improving cross-modal retrieval accuracy.

Contribution

It proposes a novel emotion-supervised contrastive learning approach to align images and music in a shared embedding space for affective-based retrieval.

Findings

01

Effective cross-modal retrieval of images and music based on emotion labels

02

The learned embeddings generalize well to automatic music tagging

03

Successful alignment of images and music in the joint embedding space

Abstract

Traditional music search engines rely on retrieval methods that match natural language queries with music metadata. There have been increasing efforts to expand retrieval methods to consider the audio characteristics of music itself, using queries of various modalities including text, video, and speech. While most approaches aim to match general music semantics to the input queries, only a few focus on affective qualities. In this work, we address the task of retrieving emotionally-relevant music from image queries by learning an affective alignment between images and music audio. Our approach focuses on learning an emotion-aligned joint embedding space between images and music. This embedding space is learned via emotion-supervised contrastive learning, using an adapted cross-modal version of the SupCon loss. We evaluate the joint embeddings through cross-modal retrieval tasks…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

shantistewart/emo-clim
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Diverse Musicological Studies · Speech and Audio Processing