# Learning Affective Correspondence between Music and Image

**Authors:** Gaurav Verma, Eeshan Gunesh Dhekane, Tanaya Guha

arXiv: 1904.00150 · 2019-04-18

## TL;DR

This paper proposes a deep learning approach to determine emotional similarity between music and images, creating a large dataset and demonstrating effective crossmodal affective correspondence prediction.

## Contribution

It introduces a novel neural network architecture for crossmodal emotion matching and constructs a large-scale database for training and evaluation.

## Key findings

- Achieves 61.67% accuracy in affective correspondence prediction
- Learns modality-specific emotion representations without explicit emotion labels
- Outperforms relevant baseline methods

## Abstract

We introduce the problem of learning affective correspondence between audio (music) and visual data (images). For this task, a music clip and an image are considered similar (having true correspondence) if they have similar emotion content. In order to estimate this crossmodal, emotion-centric similarity, we propose a deep neural network architecture that learns to project the data from the two modalities to a common representation space, and performs a binary classification task of predicting the affective correspondence (true or false). To facilitate the current study, we construct a large scale database containing more than $3,500$ music clips and $85,000$ images with three emotion classes (positive, neutral, negative). The proposed approach achieves $61.67\%$ accuracy for the affective correspondence prediction task on this database, outperforming two relevant and competitive baselines. We also demonstrate that our network learns modality-specific representations of emotion (without explicitly being trained with emotion labels), which are useful for emotion recognition in individual modalities.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1904.00150/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/1904.00150/full.md

## References

27 references — full list in the complete paper: https://tomesphere.com/paper/1904.00150/full.md

---
Source: https://tomesphere.com/paper/1904.00150