S-PRESSO: Ultra Low Bitrate Sound Effect Compression With Diffusion Autoencoders And Offline Quantization
Zineb Lahrichi (IP Paris, SonyAI), Ga\"etan Hadjeres (SonyAI), Ga\"el Richard (IP Paris), Geoffroy Peeters (IP Paris)

TL;DR
S-PRESSO is a novel neural audio compression system that achieves ultra-low bitrate sound effect compression down to 0.096 kbps using diffusion autoencoders and offline quantization, producing realistic audio reconstructions.
Contribution
It introduces a new high-resolution sound effect compression method that leverages latent diffusion models and offline quantization for ultra-low bitrate audio encoding.
Findings
Outperforms baseline methods in audio quality and similarity.
Achieves compression rates up to 750x with convincing reconstructions.
Operates effectively at 48kHz with 1Hz frame rate.
Abstract
Neural audio compression models have recently achieved extreme compression rates, enabling efficient latent generative modeling. Conversely, latent generative models have been applied to compression, pushing the limits of continuous and discrete approaches. However, existing methods remain constrained to low-resolution audio and degrade substantially at very low bitrates, where audible artifacts are prominent. In this paper, we present S-PRESSO, a 48kHz sound effect compression model that produces both continuous and discrete embeddings at ultra-low bitrates, down to 0.096 kbps, via offline quantization. Our model relies on a pretrained latent diffusion model to decode compressed audio embeddings learned by a latent encoder. Leveraging the generative priors of the diffusion decoder, we achieve extremely low frame rates, down to 1Hz (750x compression rate), producing convincing and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Speech and Audio Processing · Music Technology and Sound Studies
