Song Emotion Classification of Lyrics with Out-of-Domain Data under Label Scarcity
Jonathan Sakunkoo, Annabella Sakunkoo

TL;DR
This paper explores using large out-of-domain datasets, like Reddit comments, to train CNN models for song lyric emotion classification, addressing data scarcity issues and achieving promising results.
Contribution
It demonstrates that out-of-domain data can effectively be used to train models for lyric emotion classification, offering a new approach to data scarcity challenges.
Findings
CNN trained on Reddit comments performs well on lyric emotion classification
Out-of-domain data improves model generalizability
Leveraging large public datasets can mitigate in-domain data scarcity
Abstract
Songs have been found to profoundly impact human emotions, with lyrics having significant power to stimulate emotional changes in the audience. There is a scarcity of large, high quality in-domain datasets for lyrics-based song emotion classification (Edmonds and Sedoc, 2021; Zhou, 2022). It has been noted that in-domain training datasets are often difficult to acquire (Zhang and Miao, 2023) and that label acquisition is often limited by cost, time, and other factors (Azad et al., 2018). We examine the novel usage of a large out-of-domain dataset as a creative solution to the challenge of training data scarcity in the emotional classification of song lyrics. We find that CNN models trained on a large Reddit comments dataset achieve satisfactory performance and generalizability to lyrical emotion classification, thus giving insights into and a promising possibility in leveraging large,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing
