Evaluating Creative Language Generation: The Case of Rap Lyric Ghostwriting
Peter Potash, Alexey Romanov, Anna Rumshisky

TL;DR
This paper develops new evaluation methods for creative language generation, specifically for rap lyric ghostwriting, addressing style, creativity, and content similarity to emulate artists effectively.
Contribution
It introduces a novel evaluation framework and provides a curated, annotated corpus of rap lyrics to assess stylistic similarity and system performance.
Findings
Evaluation methodology effectively measures stylistic similarity
Annotated corpus enables manual assessment of generated lyrics
Framework guides future research in creative language generation
Abstract
Language generation tasks that seek to mimic human ability to use language creatively are difficult to evaluate, since one must consider creativity, style, and other non-trivial aspects of the generated text. The goal of this paper is to develop evaluation methods for one such task, ghostwriting of rap lyrics, and to provide an explicit, quantifiable foundation for the goals and future directions of this task. Ghostwriting must produce text that is similar in style to the emulated artist, yet distinct in content. We develop a novel evaluation methodology that addresses several complementary aspects of this task, and illustrate how such evaluation can be used to meaningfully analyze system performance. We provide a corpus of lyrics for 13 rap artists, annotated for stylistic similarity, which allows us to assess the feasibility of manual evaluation for generated verse.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
