Byte Pair Encoding for Symbolic Music
Nathan Fradet, Nicolas Gutowski, Fabien Chhel, Jean-Pierre Briot

TL;DR
This paper applies Byte Pair Encoding to symbolic music tokenization, reducing sequence length and increasing vocabulary size, which enhances model expressiveness, improves results, and speeds up inference in music generation and classification tasks.
Contribution
It introduces BPE for symbolic music tokenization, enabling more expressive tokens and more efficient processing compared to previous small-vocabulary methods.
Findings
BPE significantly reduces sequence length in music tokenization.
Using BPE improves model performance in generation and classification.
BPE enables faster inference with more expressive tokens.
Abstract
When used with deep learning, the symbolic music modality is often coupled with language model architectures. To do so, the music needs to be tokenized, i.e. converted into a sequence of discrete tokens. This can be achieved by different approaches, as music can be composed of simultaneous tracks, of simultaneous notes with several attributes. Until now, the proposed tokenizations rely on small vocabularies of tokens describing the note attributes and time events, resulting in fairly long token sequences, and a sub-optimal use of the embedding space of language models. Recent research has put efforts on reducing the overall sequence length by merging embeddings or combining tokens. In this paper, we show that Byte Pair Encoding, a compression technique widely used for natural language, significantly decreases the sequence length while increasing the vocabulary size. By doing so, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Music Technology and Sound Studies · Neuroscience and Music Perception
MethodsByte Pair Encoding
