Breaking Speech Recognizers to Imagine Lyrics
Jon Gillick, David Bamman

TL;DR
This paper presents a novel method that uses speech recognition technology on non-speech audio to generate imagined lyrics, enabling new creative possibilities in music and sound analysis.
Contribution
It introduces a technique that repurposes speech recognition tools on instrumental and environmental sounds to produce imagined lyrics, a novel approach for creative audio analysis.
Findings
Successful generation of imagined lyrics from non-speech audio
Potential for machine-in-the-loop creative collaboration
Initial analysis shows promise for creative applications
Abstract
We introduce a new method for generating text, and in particular song lyrics, based on the speech-like acoustic qualities of a given audio file. We repurpose a vocal source separation algorithm and an acoustic model trained to recognize isolated speech, instead inputting instrumental music or environmental sounds. Feeding the "mistakes" of the vocal separator into the recognizer, we obtain a transcription of words \emph{imagined} to be spoken in the input audio. We describe the key components of our approach, present initial analysis, and discuss the potential of the method for machine-in-the-loop collaboration in creative applications.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech Recognition and Synthesis · Speech and Audio Processing
