SongSage: A Large Musical Language Model with Lyric Generative Pre-training

Jiani Guo; Jiajia Li; Jie Wu; Zuchao Li; Yujiu Yang; Ping Wang

arXiv:2601.01153·cs.CL·January 6, 2026

SongSage: A Large Musical Language Model with Lyric Generative Pre-training

Jiani Guo, Jiajia Li, Jie Wu, Zuchao Li, Yujiu Yang, Ping Wang

PDF

Open Access

TL;DR

SongSage is a large musical language model trained on lyric-focused data, demonstrating strong lyric understanding, query rewriting, and lyric generation capabilities, advancing music AI research.

Contribution

Introduces SongSage, a novel lyric-centric language model trained on LyricBank, with extensive fine-tuning for diverse lyric-related tasks, improving music AI applications.

Findings

01

Outperforms in lyric rewriting and generation tasks

02

Achieves strong lyric-centric knowledge understanding

03

Maintains general knowledge proficiency with competitive MMLU score

Abstract

Large language models have achieved significant success in various domains, yet their understanding of lyric-centric knowledge has not been fully explored. In this work, we first introduce PlaylistSense, a dataset to evaluate the playlist understanding capability of language models. PlaylistSense encompasses ten types of user queries derived from common real-world perspectives, challenging LLMs to accurately grasp playlist features and address diverse user intents. Comprehensive evaluations indicate that current general-purpose LLMs still have potential for improvement in playlist understanding. Inspired by this, we introduce SongSage, a large musical language model equipped with diverse lyric-centric intelligence through lyric generative pretraining. SongSage undergoes continual pretraining on LyricBank, a carefully curated corpus of 5.48 billion tokens focused on lyrical content,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Machine Learning in Materials Science · Music Technology and Sound Studies