Lyric document embeddings for music tagging
Matt McVicar, Bruno Di Giorgi, Baris Dundar, Matthias Mauch

TL;DR
This paper investigates various methods for creating fixed-dimensional lyric embeddings for music tagging, finding that simple averaging of pretrained embeddings often outperforms complex neural models across multiple tasks.
Contribution
It provides an extensive empirical comparison of token-level and document-level lyric embedding methods on a large-scale dataset for music tagging.
Findings
Averaging pretrained embeddings outperforms complex neural architectures in many tasks.
Simple methods are competitive with or better than advanced models.
The study covers diverse tagging tasks like genre, explicit content, and era detection.
Abstract
We present an empirical study on embedding the lyrics of a song into a fixed-dimensional feature for the purpose of music tagging. Five methods of computing token-level and four methods of computing document-level representations are trained on an industrial-scale dataset of tens of millions of songs. We compare simple averaging of pretrained embeddings to modern recurrent and attention-based neural architectures. Evaluating on a wide range of tagging tasks such as genre classification, explicit content identification and era detection, we find that averaging word embeddings outperform more complex architectures in many downstream metrics.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Music Technology and Sound Studies · Diverse Musicological Studies
