FlexTok: Resampling Images into 1D Token Sequences of Flexible Length

Roman Bachmann; Jesse Allardice; David Mizrahi; Enrico Fini; O\u{g}uzhan Fatih Kar; Elmira Amirloo; Alaaeldin El-Nouby; Amir Zamir; Afshin Dehghan

arXiv:2502.13967·cs.CV·June 5, 2025

FlexTok: Resampling Images into 1D Token Sequences of Flexible Length

Roman Bachmann, Jesse Allardice, David Mizrahi, Enrico Fini, O\u{g}uzhan Fatih Kar, Elmira Amirloo, Alaaeldin El-Nouby, Amir Zamir, Afshin Dehghan

PDF

Open Access 7 Models 1 Video

TL;DR

FlexTok introduces a flexible, variable-length 1D image tokenizer that adapts to image complexity, enabling efficient autoregressive image generation with high quality across different token counts.

Contribution

We propose FlexTok, a novel image tokenizer that produces variable-length 1D token sequences, allowing adaptive compression and improved generation quality.

Findings

01

Achieves FID<2 with 8 to 128 tokens on ImageNet

02

Outperforms TiTok and matches state-of-the-art with fewer tokens

03

Enables coarse-to-fine image description in token space

Abstract

Image tokenization has enabled major advances in autoregressive image generation by providing compressed, discrete representations that are more efficient to process than raw pixels. While traditional approaches use 2D grid tokenization, recent methods like TiTok have shown that 1D tokenization can achieve high generation quality by eliminating grid redundancies. However, these methods typically use a fixed number of tokens and thus cannot adapt to an image's inherent complexity. We introduce FlexTok, a tokenizer that projects 2D images into variable-length, ordered 1D token sequences. For example, a 256x256 image can be resampled into anywhere from 1 to 256 discrete tokens, hierarchically and semantically compressing its information. By training a rectified flow model as the decoder and using nested dropout, FlexTok produces plausible reconstructions regardless of the chosen token…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

FlexTok: Resampling Images into 1D Token Sequences of Flexible Length· slideslive

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Computer Graphics and Visualization Techniques · Medical Image Segmentation Techniques

MethodsAttention Is All You Need · Linear Layer · Byte Pair Encoding · Layer Normalization · Residual Connection · Dense Connections · Multi-Head Attention · Position-Wise Feed-Forward Layer · Adam · Softmax