Catch-A-Waveform: Learning to Generate Audio from a Single Short Example
Gal Greshler, Tamar Rott Shaham, Tomer Michaeli

TL;DR
This paper introduces a GAN-based model capable of learning from as little as 20 seconds of a single audio example to generate diverse, semantically similar audio samples, enabling various creative and restorative audio applications.
Contribution
The paper presents a novel single-example audio generation model that does not require pre-training or external supervision, achieving state-of-the-art results with minimal training data.
Findings
Effective with as little as 20 seconds of training audio
Can generate diverse, semantically similar audio samples
Enables applications like inpainting, super-resolution, and creative remixing
Abstract
Models for audio generation are typically trained on hours of recordings. Here, we illustrate that capturing the essence of an audio source is typically possible from as little as a few tens of seconds from a single training signal. Specifically, we present a GAN-based generative model that can be trained on one short audio signal from any domain (e.g. speech, music, etc.) and does not require pre-training or any other form of external supervision. Once trained, our model can generate random samples of arbitrary duration that maintain semantic similarity to the training waveform, yet exhibit new compositions of its audio primitives. This enables a long line of interesting applications, including generating new jazz improvisations or new a-cappella rap variants based on a single short example, producing coherent modifications to famous songs (e.g. adding a new verse to a Beatles song…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsMusic and Audio Processing · Speech and Audio Processing · Music Technology and Sound Studies
MethodsVERtex Similarity Embeddings
