TL;DR
Break-the-Beat! introduces a controllable MIDI-to-drum audio synthesis model that renders drum MIDI with reference audio timbre, enabling high-quality, rhythmically aligned drum sound generation for digital music producers.
Contribution
It presents a novel fine-tuned model with a new dataset for controllable, high-fidelity drum audio synthesis from MIDI, addressing limitations of existing methods.
Findings
High-quality drum audio generated with strong rhythmic alignment.
Model outperforms existing methods on quality, rhythm, and beat continuity metrics.
Constructed a new paired drum audio dataset for training and evaluation.
Abstract
Current methods for creating drum loop audio in digital music production, such as using one-shot samples or resampling, often demand non-trivial efforts of creators. While recent generative models achieve high fidelity and adhere to text, they lack the specific control needed for such a task. Existing symbolic-to-audio research often focuses on single, tonal instruments, leaving the challenge of polyphonic, percussive drum synthesis unaddressed. We address this gap by introducing ``Break-the-Beat!,'' a model capable of rendering a drum MIDI with the timbre of a reference audio. It is built by fine-tuning a pre-trained text-to-audio model with our proposed content encoder and a effective hybrid conditioning mechanism. To enable this, we construct a new dataset of paired target-reference drum audio from existing drum audio datasets. Experiments demonstrate that our model generates…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
