TL;DR
Stable Audio 3 introduces fast, variable-length audio generation and editing using latent diffusion models with a novel autoencoder, enabling high-quality, efficient audio synthesis on consumer hardware.
Contribution
The paper presents a family of latent diffusion models with a new semantic-acoustic autoencoder and adversarial post-training for improved audio generation and editing.
Findings
Models generate several minutes of audio efficiently.
Autoencoder preserves audio fidelity and semantic structure.
Post-training accelerates inference and enhances quality.
Abstract
Stable Audio 3 is a family of fast latent diffusion models (small, medium, large) for variable-length audio generation and editing. Since our models can generate several minutes of audio, variable-length generations are key to avoid the cost of producing full-length generations for short sounds. We also support inpainting, enabling targeted audio editing and the continuation of short recordings. Our latent diffusion models operate on top of a novel semantic-acoustic autoencoder that projects audio into a compact latent space, enabling efficient diffusion-based generation while preserving audio fidelity and encouraging semantic structure in the latent. Finally, we run adversarial post-training to both accelerate inference and improve generation quality, reducing the number of inference steps while improving fidelity and prompt adherence. Stable Audio 3 models are trained on licensed and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗stabilityai/stable-audio-3-mediummodel· 14k dl· ♡ 13414k dl♡ 134
- 🤗stabilityai/stable-audio-3-small-musicmodel· 9.2k dl· ♡ 569.2k dl♡ 56
- 🤗stabilityai/stable-audio-3-small-sfxmodel· 4.5k dl· ♡ 404.5k dl♡ 40
- 🤗stabilityai/stable-audio-3-optimizedmodel· ♡ 15♡ 15
- 🤗stabilityai/stable-audio-3-medium-basemodel· 1.4k dl· ♡ 181.4k dl♡ 18
- 🤗stabilityai/stable-audio-3-small-music-basemodel· 694 dl· ♡ 10694 dl♡ 10
- 🤗stabilityai/stable-audio-3-small-sfx-basemodel· 473 dl· ♡ 7473 dl♡ 7
- 🤗cocktailpeanut/stable-audio-3-small-sfxmodel· 645 dl645 dl
- 🤗cocktailpeanut/stable-audio-3-small-musicmodel· 1.7k dl1.7k dl
- 🤗cocktailpeanut/stable-audio-3-mediummodel· 1.4k dl1.4k dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
