Woosh: A Sound Effects Foundation Model

Ga\"etan Hadjeres; Marc Ferras; Khaled Koutini; Benno Weck; Alexandre Bittar; Thomas Hummel; Zineb Lahrichi; Hakim Missoum; Joan Serr\`a; Yuki Mitsufuji

arXiv:2604.01929·cs.SD·April 30, 2026

Woosh: A Sound Effects Foundation Model

Ga\"etan Hadjeres, Marc Ferras, Khaled Koutini, Benno Weck, Alexandre Bittar, Thomas Hummel, Zineb Lahrichi, Hakim Missoum, Joan Serr\`a, Yuki Mitsufuji

PDF

1 Repo

TL;DR

Woosh is a comprehensive sound effects foundation model by Sony AI, offering high-quality audio generation and alignment tools, with competitive performance and accessible code and demos.

Contribution

First open sound effects foundation model with multiple modules, optimized for sound effects, and publicly released with code and demos.

Findings

01

Competitive performance against existing models like StableAudio-Open and TangoFlux

02

Includes multiple modules: audio encoder/decoder, text-audio alignment, text-to-audio, video-to-audio

03

Supports low-resource operation and fast inference

Abstract

The audio research community depends on open generative models as foundational tools for building novel approaches and establishing baselines. In this report, we present Woosh, Sony AI's publicly released sound effect foundation model, detailing its architecture, training process, and an evaluation against other popular open models. Being optimized for sound effects, we provide (1) a high-quality audio encoder/decoder model and (2) a text-audio alignment model for conditioning, together with (3) text-to-audio and (4) video-to-audio generative models. Distilled text-to-audio and video-to-audio models are also included in the release, allowing for low-resource operation and fast inference. Our evaluation on both public and private data shows competitive or better performance for each module when compared to existing open alternatives like StableAudio-Open and TangoFlux. Inference code and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

SonyResearch/Woosh
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.