Latent Granular Resynthesis using Neural Audio Codecs

Nao Tokui; Tom Baker

arXiv:2507.19202·cs.SD·July 28, 2025

Latent Granular Resynthesis using Neural Audio Codecs

Nao Tokui, Tom Baker

PDF

Open Access

TL;DR

This paper presents a new neural audio resynthesis method that creates a latent codebook from source audio, enabling flexible, high-quality granular synthesis without model training, suitable for diverse sounds.

Contribution

It introduces a latent vector-based granular synthesis technique that eliminates the need for training and reduces discontinuities in audio resynthesis.

Findings

01

Creates a versatile latent codebook for audio materials

02

Produces high-quality, temporally coherent audio reconstructions

03

Operates without model training, enabling easy experimentation

Abstract

We introduce a novel technique for creative audio resynthesis that operates by reworking the concept of granular synthesis at the latent vector level. Our approach creates a "granular codebook" by encoding a source audio corpus into latent vector segments, then matches each latent grain of a target audio signal to its closest counterpart in the codebook. The resulting hybrid sequence is decoded to produce audio that preserves the target's temporal structure while adopting the source's timbral characteristics. This technique requires no model training, works with diverse audio materials, and naturally avoids the discontinuities typical of traditional concatenative synthesis through the codec's implicit interpolation during decoding. We include supplementary material at https://github.com/naotokui/latentgranular/ , as well as a proof-of-concept implementation to allow users to experiment…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic Technology and Sound Studies · Generative Adversarial Networks and Image Synthesis · Music and Audio Processing