PerTok: Expressive Encoding and Modeling of Symbolic Musical Ideas and   Variations

Julian Lenz; Anirudh Mani

arXiv:2410.02060·cs.SD·October 4, 2024

PerTok: Expressive Encoding and Modeling of Symbolic Musical Ideas and Variations

Julian Lenz, Anirudh Mani

PDF

Open Access

TL;DR

Cadenza is a multi-stage generative framework that uses a novel MIDI encoding to produce expressive, stylistically related musical variations and ideas, combining a transformer-based VAE and a bidirectional encoder.

Contribution

It introduces PerTok, a MIDI encoding method that captures expressive details while significantly reducing sequence length and vocabulary size, and a two-stage generative framework for expressive music synthesis.

Findings

01

Cadenza matches state-of-the-art models in musical quality and expressiveness.

02

It can generate new, stylistically related musical ideas with novelty.

03

Objective and human evaluations confirm its versatility and ethical design.

Abstract

We introduce Cadenza, a new multi-stage generative framework for predicting expressive variations of symbolic musical ideas as well as unconditional generations. To accomplish this we propose a novel MIDI encoding method, PerTok (Performance Tokenizer) that captures minute expressive details whilst reducing sequence length up to 59% and vocabulary size up to 95% for polyphonic, monophonic and rhythmic tasks. The proposed framework comprises of two sequential stages: 1) Composer and 2) Performer. The Composer model is a transformer-based Variational Autoencoder (VAE), with Rotary Positional Embeddings (RoPE)ROPE and an autoregressive decoder modified to more effectively integrate the latent codes of the input musical idea. The Performer model is a bidirectional transformer encoder that is separately trained to predict velocities and microtimings on MIDI sequences. Objective and human…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Music Technology and Sound Studies · Diverse Musicological Studies

MethodsFast Attention Via Positive Orthogonal Random Features · Performer