Synthetic Lyrics Detection Across Languages and Genres
Yanis Labrak, Markus Frohmann, Gabriel Meseguer-Brocal, Elena V., Epure

TL;DR
This paper investigates the detection of AI-generated lyrics across multiple languages and genres, creating a diverse dataset and evaluating detection methods to improve transparency and address copyright concerns.
Contribution
It introduces a new dataset of real and synthetic lyrics across languages and genres, and evaluates and adapts detection methods for lyrics, a previously unexplored domain.
Findings
Detection methods show promising results across languages and genres.
Unsupervised domain adaptation improves detection accuracy.
Methods generalize well with limited data in few-shot scenarios.
Abstract
In recent years, the use of large language models (LLMs) to generate music content, particularly lyrics, has gained in popularity. These advances provide valuable tools for artists and enhance their creative processes, but they also raise concerns about copyright violations, consumer satisfaction, and content spamming. Previous research has explored content detection in various domains. However, no work has focused on the text modality, lyrics, in music. To address this gap, we curated a diverse dataset of real and synthetic lyrics from multiple languages, music genres, and artists. The generation pipeline was validated using both humans and automated methods. We performed a thorough evaluation of existing synthetic text detection approaches on lyrics, a previously unexplored data type. We also investigated methods to adapt the best-performing features to lyrics through unsupervised…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMusic and Audio Processing · Diverse Musicological Studies
