Modelling Emotion Dynamics in Song Lyrics with State Space Models
Yingjin Song, Daniel Beck

TL;DR
This paper introduces a novel method using State Space Models to predict emotion dynamics in song lyrics at the sentence level without requiring annotated song data, improving emotion recognition in music.
Contribution
It presents a new approach combining sentence-level emotion prediction with EM to model emotion changes within songs without song-level supervision.
Findings
Improves emotion recognition performance over sentence-level baselines.
Effectively models emotion dynamics without annotated full songs.
Highlights limitations and future directions for emotion modeling.
Abstract
Most previous work in music emotion recognition assumes a single or a few song-level labels for the whole song. While it is known that different emotions can vary in intensity within a song, annotated data for this setup is scarce and difficult to obtain. In this work, we propose a method to predict emotion dynamics in song lyrics without song-level supervision. We frame each song as a time series and employ a State Space Model (SSM), combining a sentence-level emotion predictor with an Expectation-Maximization (EM) procedure to generate the full emotion dynamics. Our experiments show that applying our method consistently improves the performance of sentence-level baselines without requiring any annotated songs, making it ideal for limited training data scenarios. Further analysis through case studies shows the benefits of our method while also indicating the limitations and pointing to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Time Series Analysis and Forecasting · Neural Networks and Applications
