PocketVAE: A Two-step Model for Groove Generation and Control
Kyungyun Lee, Wonil Kim, Juhan Nam

TL;DR
PocketVAE is a two-step groove generation system that enhances MIDI drum tracks by transferring, generating, and controlling grooves using a discrete latent space, improving the modeling of drum note distributions.
Contribution
The paper introduces a novel two-step approach with discrete latent representations for groove generation and control in MIDI drum tracks, combining note editing with velocity and microtiming details.
Findings
Discrete latent space improves data distribution modeling.
Two-step approach enhances groove realism.
Incorporating control elements enables customizable groove generation.
Abstract
Creating a good drum track to imitate a skilled performer in digital audio workstations (DAWs) can be a time-consuming process, especially for those unfamiliar with drums. In this work, we introduce PocketVAE, a groove generation system that applies grooves to users' rudimentary MIDI tracks, i.e, templates. Grooves can be either transferred from a reference track, generated randomly or with conditions, such as genres. Our system, consisting of different modules for each groove component, takes a two-step approach that is analogous to a music creation process. First, the note module updates the user template through addition and deletion of notes; Second, the velocity and microtiming modules add details to this generated note score. In order to model the drum notes, we apply a discrete latent representation method via Vector Quantized Variational Autoencoder (VQ-VAE), as drum notes have…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Music Technology and Sound Studies · Generative Adversarial Networks and Image Synthesis
