Multidimensional Byte Pair Encoding: Shortened Sequences for Improved Visual Data Generation
Tim Elsner, Paula Usinger, Julius Nehring-Wirxel, Gregor Kobsik,, Victor Czech, Yanjiang He, Isaak Lim, Leif Kobbelt

TL;DR
This paper introduces a multidimensional Byte Pair Encoding method for visual data that shortens sequences and improves transformer training efficiency by globally aware tokenization, with minimal computational overhead.
Contribution
It extends Byte Pair Encoding to multiple dimensions for images, enabling lossless, content-aware tokenization that enhances transformer performance on visual datasets.
Findings
Shorter, more uniform token sequences improve transformer training.
Compression condenses empty regions into single tokens.
Sequence processing becomes easier and more efficient.
Abstract
In language processing, transformers benefit greatly from text being condensed. This is achieved through a larger vocabulary that captures word fragments instead of plain characters. This is often done with Byte Pair Encoding. In the context of images, tokenisation of visual data is usually limited to regular grids obtained from quantisation methods, without global content awareness. Our work improves tokenisation of visual data by bringing Byte Pair Encoding from 1D to multiple dimensions, as a complementary add-on to existing compression. We achieve this through counting constellations of token pairs and replacing the most frequent token pair with a newly introduced token. The multidimensionality only increases the computation time by a factor of 2 for images, making it applicable even to large datasets like ImageNet within minutes on consumer hardware. This is a lossless…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topics3D Modeling in Geospatial Applications · Algorithms and Data Compression · Computer Graphics and Visualization Techniques
