Stable Audio Open
Zach Evans, Julian D. Parker, CJ Carr, Zack Zukowski, Josiah Taylor,, Jordi Pons

TL;DR
This paper introduces an open-weights text-to-audio model trained on Creative Commons data, demonstrating competitive performance and high-quality stereo sound synthesis at 44.1kHz, facilitating community access and further research.
Contribution
It presents a new open-weights text-to-audio model architecture and training process, enabling accessible high-quality audio generation for artists and researchers.
Findings
Competitive performance across various metrics
High-quality stereo sound synthesis at 44.1kHz
Model is openly available for community use
Abstract
Open generative models are vitally important for the community, allowing for fine-tunes and serving as baselines when presenting new models. However, most current text-to-audio models are private and not accessible for artists and researchers to build upon. Here we describe the architecture and training process of a new open-weights text-to-audio model trained with Creative Commons data. Our evaluation shows that the model's performance is competitive with the state-of-the-art across various metrics. Notably, the reported FDopenl3 results (measuring the realism of the generations) showcase its potential for high-quality stereo sound synthesis at 44.1kHz.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗stabilityai/stable-audio-open-1.0model· 23k dl· ♡ 143023k dl♡ 1430
- 🤗PsiPi/audiomodel· ♡ 3♡ 3
- 🤗RedbeardNZ/stable-audio-open-1.0model· 21 dl· ♡ 121 dl♡ 1
- 🤗ford442/stable-audio-open-1.0model· 228 dl228 dl
- 🤗ModelsLab/stable-audio-open-1.0model· 1.6k dl1.6k dl
- 🤗jasonvassallo/mlx-audio-generatemodel· ♡ 1♡ 1
- 🤗LanguaMan/stable-audio-open-1-0-modelmodel· 6 dl6 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic Technology and Sound Studies · Music and Audio Processing
