EMID: An Emotional Aligned Dataset in Audio-Visual Modality
Jialing Zou, Jiahao Mei, Guangze Ye, Tianyu Huai, Qiwei Shen, Daoguo, Dong

TL;DR
This paper introduces EMID, a new dataset for emotional alignment of music and images, enhancing cross-modal tasks by emphasizing emotional consistency based on a detailed 13-dimension model, validated through psychological experiments.
Contribution
The paper presents EMID, a novel dataset emphasizing emotional matching in audio-visual data, and introduces EMI-Adapter to improve cross-modal alignment methods.
Findings
Emotional alignment improves cross-modal matching accuracy.
Psychological experiments validate the effectiveness of emotional consistency.
EMID facilitates research in psychotherapy and emotion-aware AI.
Abstract
In this paper, we propose Emotionally paired Music and Image Dataset (EMID), a novel dataset designed for the emotional matching of music and images, to facilitate auditory-visual cross-modal tasks such as generation and retrieval. Unlike existing approaches that primarily focus on semantic correlations or roughly divided emotional relations, EMID emphasizes the significance of emotional consistency between music and images using an advanced 13-dimension emotional model. By incorporating emotional alignment into the dataset, it aims to establish pairs that closely align with human perceptual understanding, thereby raising the performance of auditory-visual cross-modal tasks. We also design a supplemental module named EMI-Adapter to optimize existing cross-modal alignment methods. To validate the effectiveness of the EMID, we conduct a psychological experiment, which has demonstrated…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Neuroscience and Music Perception · Multisensory perception and integration
