BEAT: Tokenizing and Generating Symbolic Music by Uniform Temporal Steps

Lekai Qian; Haoyu Gu; Jingwei Zhao; Ziyu Wang

arXiv:2604.19532·cs.SD·April 22, 2026

BEAT: Tokenizing and Generating Symbolic Music by Uniform Temporal Steps

Lekai Qian, Haoyu Gu, Jingwei Zhao, Ziyu Wang

PDF

TL;DR

This paper introduces a novel tokenization method for symbolic music that uses uniform temporal steps, improving the quality, coherence, and efficiency of music generation models.

Contribution

It proposes a new tokenization approach based on uniform time steps, contrasting with traditional event-based methods, and demonstrates its advantages in music modeling tasks.

Findings

01

Improved musical quality and structural coherence in generated music.

02

Higher efficiency and better capture of long-range patterns.

03

Enhanced performance in music continuation and accompaniment tasks.

Abstract

Tokenizing music to fit the general framework of language models is a compelling challenge, especially considering the diverse symbolic structures in which music can be represented (e.g., sequences, grids, and graphs). To date, most approaches tokenize symbolic music as sequences of musical events, such as onsets, pitches, time shifts, or compound note events. This strategy is intuitive and has proven effective in Transformer-based models, but it treats the regularity of musical time implicitly: individual tokens may span different durations, resulting in non-uniform time progression. In this paper, we instead consider whether an alternative tokenization is possible, where a uniform-length musical step (e.g., a beat) serves as the basic unit. Specifically, we encode all events within a single time step at the same pitch as one token, and group tokens explicitly by time step, which…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.