CRASH: Raw Audio Score-based Generative Modeling for Controllable High-resolution Drum Sound Synthesis
Simon Rouard, Ga\"etan Hadjeres

TL;DR
CRASH introduces a high-resolution, controllable raw audio generative model for drum sounds using diffusion processes, enabling diverse sampling methods and hybrid sound creation with improved flexibility and training simplicity.
Contribution
The paper presents a novel diffusion-based approach for raw audio synthesis that achieves high-resolution, controllable drum sound generation with versatile sampling options.
Findings
Enables high-resolution 44.1kHz drum sound synthesis
Offers multiple sampling schemes including inpainting and interpolation
Introduces class-mixing sampling for hybrid sound creation
Abstract
In this paper, we propose a novel score-base generative model for unconditional raw audio synthesis. Our proposal builds upon the latest developments on diffusion process modeling with stochastic differential equations, which already demonstrated promising results on image generation. We motivate novel heuristics for the choice of the diffusion processes better suited for audio generation, and consider the use of a conditional U-Net to approximate the score function. While previous approaches on diffusion models on audio were mainly designed as speech vocoders in medium resolution, our method termed CRASH (Controllable Raw Audio Synthesis with High-resolution) allows us to generate short percussive sounds in 44.1kHz in a controllable way. Through extensive experiments, we showcase on a drum sound generation task the numerous sampling schemes offered by our method (unconditional…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Music Technology and Sound Studies · Speech and Audio Processing
Methods*Communicated@Fast*How Do I Communicate to Expedia? · Diffusion · Concatenated Skip Connection · Max Pooling · Convolution · U-Net
