LyCon: Lyrics Reconstruction from the Bag-of-Words Using Large Language Models
Haven Kim, Kahyun Choi

TL;DR
This paper presents LyCon, a method that uses large language models and metadata to reconstruct copyright-free lyrics from Bag-of-Words datasets, enabling lyric research without copyright issues.
Contribution
The study introduces a novel approach for generating lyrics from BoW datasets using metadata and large language models, and provides a publicly available dataset of reconstructed lyrics.
Findings
Successfully reconstructed lyrics aligned with metadata
Created a publicly accessible dataset of reconstructed lyrics
Enabled new research possibilities in lyric studies
Abstract
This paper addresses the unique challenge of conducting research in lyric studies, where direct use of lyrics is often restricted due to copyright concerns. Unlike typical data, internet-sourced lyrics are frequently protected under copyright law, necessitating alternative approaches. Our study introduces a novel method for generating copyright-free lyrics from publicly available Bag-of-Words (BoW) datasets, which contain the vocabulary of lyrics but not the lyrics themselves. Utilizing metadata associated with BoW datasets and large language models, we successfully reconstructed lyrics. We have compiled and made available a dataset of reconstructed lyrics, LyCon, aligned with metadata from renowned sources including the Million Song Dataset, Deezer Mood Detection Dataset, and AllMusic Genre Dataset, available for public access. We believe that the integration of metadata such as mood…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Natural Language Processing Techniques · Computational and Text Analysis Methods
