Drum Synthesis from Expressive Drum Grids via Neural Audio Codecs

Konstantinos Soiledis; Maximos Kaliakatsos-Papakostas; Dimos Makris; Konstantinos Tsamis

arXiv:2605.10281·cs.SD·May 12, 2026

Drum Synthesis from Expressive Drum Grids via Neural Audio Codecs

Konstantinos Soiledis, Maximos Kaliakatsos-Papakostas, Dimos Makris, Konstantinos Tsamis

PDF

TL;DR

This paper introduces a neural network system that converts expressive drum MIDI grids into realistic drum audio by predicting neural audio codec tokens, evaluated on the E-GMD dataset.

Contribution

It demonstrates that predicting neural audio codec tokens from drum grids is effective for realistic drum sound synthesis, comparing multiple state-of-the-art codecs.

Findings

01

Codec-token prediction yields high-fidelity drum audio

02

Transformer models effectively map MIDI to audio tokens

03

Choice of neural codec impacts audio quality

Abstract

Generating realistic drum audio directly from symbolic representations is a challenging task at the intersection of music perception and machine learning. We propose a system that transforms an expressive drum grid, a time-aligned MIDI representation with microtiming and velocity information, into drum audio by predicting discrete codes of a neural audio codec. Our approach uses a Transformer-based model to map the drum grid input to a sequence of codec tokens, which are then converted to waveform audio via a pre-trained codec decoder. We experiment with multiple state-of-the-art neural codecs, namely EnCodec, DAC, and X-Codec, to assess how the choice of audio representation impacts the quality of the generated drums. The system is trained and evaluated on the Expanded Groove MIDI Dataset, E-GMD, a large collection of human drum performances with paired MIDI and audio. We evaluate the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.