Breaking Speech Recognizers to Imagine Lyrics

Jon Gillick; David Bamman

arXiv:1912.06979·cs.HC·December 17, 2019·1 cites

Breaking Speech Recognizers to Imagine Lyrics

Jon Gillick, David Bamman

PDF

Open Access 1 Repo

TL;DR

This paper presents a novel method that uses speech recognition technology on non-speech audio to generate imagined lyrics, enabling new creative possibilities in music and sound analysis.

Contribution

It introduces a technique that repurposes speech recognition tools on instrumental and environmental sounds to produce imagined lyrics, a novel approach for creative audio analysis.

Findings

01

Successful generation of imagined lyrics from non-speech audio

02

Potential for machine-in-the-loop creative collaboration

03

Initial analysis shows promise for creative applications

Abstract

We introduce a new method for generating text, and in particular song lyrics, based on the speech-like acoustic qualities of a given audio file. We repurpose a vocal source separation algorithm and an acoustic model trained to recognize isolated speech, instead inputting instrumental music or environmental sounds. Feeding the "mistakes" of the vocal separator into the recognizer, we obtain a transcription of words \emph{imagined} to be spoken in the input audio. We describe the key components of our approach, present initial analysis, and discuss the potential of the method for machine-in-the-loop collaboration in creative applications.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jrgillick/imagined-lyrics
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech Recognition and Synthesis · Speech and Audio Processing