MERGE -- A Bimodal Audio-Lyrics Dataset for Static Music Emotion Recognition

Pedro Lima Louro; Hugo Redinho; Ricardo Santos; Ricardo Malheiro; Renato Panda; Rui Pedro Paiva

arXiv:2407.06060·cs.SD·June 19, 2025·1 cites

MERGE -- A Bimodal Audio-Lyrics Dataset for Static Music Emotion Recognition

Pedro Lima Louro, Hugo Redinho, Ricardo Santos, Ricardo Malheiro, Renato Panda, Rui Pedro Paiva

PDF

Open Access

TL;DR

This paper introduces MERGE, a set of three large, quality-controlled bimodal audio-lyrics datasets for music emotion recognition, enabling improved benchmarking and development of multimodal systems.

Contribution

The paper presents three new publicly available bimodal datasets created with a semi-automatic approach, along with baseline experiments and validated data splits for music emotion recognition research.

Findings

01

Achieved up to 81.74% F1-score in bimodal classification

02

Validated the datasets' viability through extensive experiments

03

Provided standardized train-validation-test splits for benchmarking

Abstract

The Music Emotion Recognition (MER) field has seen steady developments in recent years, with contributions from feature engineering, machine learning, and deep learning. The landscape has also shifted from audio-centric systems to bimodal ensembles that combine audio and lyrics. However, a lack of public, sizable and quality-controlled bimodal databases has hampered the development and improvement of bimodal audio-lyrics systems. This article proposes three new audio, lyrics, and bimodal MER research datasets, collectively referred to as MERGE, which were created using a semi-automatic approach. To comprehensively assess the proposed datasets and establish a baseline for benchmarking, we conducted several experiments for each modality, using feature engineering, machine learning, and deep learning methodologies. Additionally, we propose and validate fixed train-validation-test splits.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech Recognition and Synthesis · Speech and Audio Processing