Stable Audio 3

Zach Evans; Julian D. Parker; Matthew Rice; CJ Carr; Zack Zukowski; Josiah Taylor; Jordi Pons

arXiv:2605.17991·cs.SD·May 19, 2026

Stable Audio 3

Zach Evans, Julian D. Parker, Matthew Rice, CJ Carr, Zack Zukowski, Josiah Taylor, Jordi Pons

PDF

1 Repo 12 Models

TL;DR

Stable Audio 3 introduces fast, variable-length audio generation and editing using latent diffusion models with a novel autoencoder, enabling high-quality, efficient audio synthesis on consumer hardware.

Contribution

The paper presents a family of latent diffusion models with a new semantic-acoustic autoencoder and adversarial post-training for improved audio generation and editing.

Findings

01

Models generate several minutes of audio efficiently.

02

Autoencoder preserves audio fidelity and semantic structure.

03

Post-training accelerates inference and enhances quality.

Abstract

Stable Audio 3 is a family of fast latent diffusion models (small, medium, large) for variable-length audio generation and editing. Since our models can generate several minutes of audio, variable-length generations are key to avoid the cost of producing full-length generations for short sounds. We also support inpainting, enabling targeted audio editing and the continuation of short recordings. Our latent diffusion models operate on top of a novel semantic-acoustic autoencoder that projects audio into a compact latent space, enabling efficient diffusion-based generation while preserving audio fidelity and encouraging semantic structure in the latent. Finally, we run adversarial post-training to both accelerate inference and improve generation quality, reducing the number of inference steps while improving fidelity and prompt adherence. Stable Audio 3 models are trained on licensed and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

stability-ai/stable-audio-3
github

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.