Break-the-Beat! Controllable MIDI-to-Drum Audio Synthesis

Shuyang Cui; Zhi Zhong; Qiyu Wu; Zachary Novack; Woosung Choi; Keisuke Toyama; Kin Wai Cheuk; Junghyun Koo; Yukara Ikemiya; Christian Simon; Chihiro Nagashima; Shusuke Takahashi

arXiv:2605.14555·cs.SD·May 15, 2026

Break-the-Beat! Controllable MIDI-to-Drum Audio Synthesis

Shuyang Cui, Zhi Zhong, Qiyu Wu, Zachary Novack, Woosung Choi, Keisuke Toyama, Kin Wai Cheuk, Junghyun Koo, Yukara Ikemiya, Christian Simon, Chihiro Nagashima, Shusuke Takahashi

PDF

1 Repo

TL;DR

Break-the-Beat! introduces a controllable MIDI-to-drum audio synthesis model that renders drum MIDI with reference audio timbre, enabling high-quality, rhythmically aligned drum sound generation for digital music producers.

Contribution

It presents a novel fine-tuned model with a new dataset for controllable, high-fidelity drum audio synthesis from MIDI, addressing limitations of existing methods.

Findings

01

High-quality drum audio generated with strong rhythmic alignment.

02

Model outperforms existing methods on quality, rhythm, and beat continuity metrics.

03

Constructed a new paired drum audio dataset for training and evaluation.

Abstract

Current methods for creating drum loop audio in digital music production, such as using one-shot samples or resampling, often demand non-trivial efforts of creators. While recent generative models achieve high fidelity and adhere to text, they lack the specific control needed for such a task. Existing symbolic-to-audio research often focuses on single, tonal instruments, leaving the challenge of polyphonic, percussive drum synthesis unaddressed. We address this gap by introducing ``Break-the-Beat!,'' a model capable of rendering a drum MIDI with the timbre of a reference audio. It is built by fine-tuning a pre-trained text-to-audio model with our proposed content encoder and a effective hybrid conditioning mechanism. To enable this, we construct a new dataset of paired target-reference drum audio from existing drum audio datasets. Experiments demonstrate that our model generates…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://ik4sumii.github.io/break-the-beat
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.