Joint sentiment analysis of lyrics and audio in music
Lea Schaab, Anna Kruspe

TL;DR
This paper evaluates separate and combined models for sentiment analysis in music using lyrics and audio, highlighting the benefits of multimodal approaches and discussing challenges like subjectivity and data scarcity.
Contribution
It introduces methods for combining lyrics and audio sentiment analysis and analyzes the causes of misclassifications and contradictions in multimodal music sentiment detection.
Findings
Combining lyrics and audio improves sentiment classification accuracy.
Different fusion approaches have varying effectiveness.
Identifies key challenges like subjectivity and data limitations.
Abstract
Sentiment or mood can express themselves on various levels in music. In automatic analysis, the actual audio data is usually analyzed, but the lyrics can also play a crucial role in the perception of moods. We first evaluate various models for sentiment analysis based on lyrics and audio separately. The corresponding approaches already show satisfactory results, but they also exhibit weaknesses, the causes of which we examine in more detail. Furthermore, different approaches to combining the audio and lyrics results are proposed and evaluated. Considering both modalities generally leads to improved performance. We investigate misclassifications and (also intentional) contradictions between audio and lyrics sentiment more closely, and identify possible causes. Finally, we address fundamental problems in this research area, such as high subjectivity, lack of data, and inconsistency in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing
