Latent Granular Resynthesis using Neural Audio Codecs
Nao Tokui, Tom Baker

TL;DR
This paper presents a new neural audio resynthesis method that creates a latent codebook from source audio, enabling flexible, high-quality granular synthesis without model training, suitable for diverse sounds.
Contribution
It introduces a latent vector-based granular synthesis technique that eliminates the need for training and reduces discontinuities in audio resynthesis.
Findings
Creates a versatile latent codebook for audio materials
Produces high-quality, temporally coherent audio reconstructions
Operates without model training, enabling easy experimentation
Abstract
We introduce a novel technique for creative audio resynthesis that operates by reworking the concept of granular synthesis at the latent vector level. Our approach creates a "granular codebook" by encoding a source audio corpus into latent vector segments, then matches each latent grain of a target audio signal to its closest counterpart in the codebook. The resulting hybrid sequence is decoded to produce audio that preserves the target's temporal structure while adopting the source's timbral characteristics. This technique requires no model training, works with diverse audio materials, and naturally avoids the discontinuities typical of traditional concatenative synthesis through the codec's implicit interpolation during decoding. We include supplementary material at https://github.com/naotokui/latentgranular/ , as well as a proof-of-concept implementation to allow users to experiment…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic Technology and Sound Studies · Generative Adversarial Networks and Image Synthesis · Music and Audio Processing
