Embedding-based Music Emotion Recognition Using Composite Loss
Naoki Takashima, Fr\'ed\'eric Li, Marcin Grzegorzek, Kimiaki, Shirahama

TL;DR
This paper introduces an embedding-based music emotion recognition method that captures both broad and fine-grained emotional variations by using a composite loss to optimize embeddings, demonstrating improved robustness and accuracy.
Contribution
It proposes a novel embedding approach with a composite loss function that considers both correlation and probabilistic similarity for more nuanced emotion recognition.
Findings
Effective on two benchmark datasets
Robust bidirectional emotion recognition achieved
Improved discrimination within emotional categories
Abstract
Most music emotion recognition approaches perform classification or regression that estimates a general emotional category from a distribution of music samples, but without considering emotional variations (e.g., happiness can be further categorised into much, moderate or little happiness). We propose an embedding-based music emotion recognition approach that associates music samples with emotions in a common embedding space by considering both general emotional categories and fine-grained discrimination within each category. Since the association of music samples with emotions is uncertain due to subjective human perceptions, we compute composite loss-based embeddings obtained to maximise two statistical characteristics, one being the correlation between music samples and emotions based on canonical correlation analysis, and the other being a probabilistic similarity between a music…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech and Audio Processing · Music Technology and Sound Studies
