Seconds-Aligned PCA-DAC Latent Diffusion for Symbolic-to-Audio Drum Rendering

Konstantinos Soiledis; Maximos Kaliakatsos Papakostas; Dimos Makris; Konstantinos Tsamis

arXiv:2605.13404·cs.SD·May 14, 2026

Seconds-Aligned PCA-DAC Latent Diffusion for Symbolic-to-Audio Drum Rendering

Konstantinos Soiledis, Maximos Kaliakatsos Papakostas, Dimos Makris, Konstantinos Tsamis

PDF

TL;DR

This paper introduces Sec2Drum-DAC, a latent diffusion model for symbolic-to-audio drum rendering that preserves event timing and dynamics, improving spectral and transient metrics over baselines.

Contribution

The paper proposes a novel latent diffusion approach conditioned on event features, using PCA components for efficient waveform synthesis in symbolic drum rendering.

Findings

01

PCA diffusion outperforms deterministic PCA regression on spectral and transient metrics.

02

Auxiliary RVQ cross-entropy enhances short-step diffusion performance.

03

Optimal denoising steps range from 6 to 25 depending on the metric.

Abstract

Symbolic-control drum generation requires preserving explicit event timing and dynamics while synthesizing acoustically plausible waveforms. We present Sec2Drum-DAC, a conditional latent-diffusion model for symbolic-to-audio drum rendering. The model conditions on event features sampled in physical time at codec-frame locations and predicts standardized principal-component coordinates of frozen DAC summed-codebook embeddings rather than waveform samples. In the evaluated DAC configuration, 72 principal components capture the observed training-frame summed-latent subspace under the stated SVD threshold, yielding a compact continuous denoising target with a deterministic reconstruction path to the 1024-dimensional DAC latent space before waveform decoding. Across 1,733 held-out four-beat windows, PCA diffusion improves paired spectral and transient metrics over deterministic PCA…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.