Stable Audio Open

Zach Evans; Julian D. Parker; CJ Carr; Zack Zukowski; Josiah Taylor,; Jordi Pons

arXiv:2407.14358·cs.SD·August 1, 2024·1 cites

Stable Audio Open

Zach Evans, Julian D. Parker, CJ Carr, Zack Zukowski, Josiah Taylor,, Jordi Pons

PDF

Open Access 1 Repo 7 Models

TL;DR

This paper introduces an open-weights text-to-audio model trained on Creative Commons data, demonstrating competitive performance and high-quality stereo sound synthesis at 44.1kHz, facilitating community access and further research.

Contribution

It presents a new open-weights text-to-audio model architecture and training process, enabling accessible high-quality audio generation for artists and researchers.

Findings

01

Competitive performance across various metrics

02

High-quality stereo sound synthesis at 44.1kHz

03

Model is openly available for community use

Abstract

Open generative models are vitally important for the community, allowing for fine-tunes and serving as baselines when presenting new models. However, most current text-to-audio models are private and not accessible for artists and researchers to build upon. Here we describe the architecture and training process of a new open-weights text-to-audio model trained with Creative Commons data. Our evaluation shows that the model's performance is competitive with the state-of-the-art across various metrics. Notably, the reported FDopenl3 results (measuring the realism of the generations) showcase its potential for high-quality stereo sound synthesis at 44.1kHz.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

stability-ai/stable-audio-tools
pytorchOfficial

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic Technology and Sound Studies · Music and Audio Processing