BitDance: Scaling Autoregressive Generative Models with Binary Tokens

Yuang Ai; Jiaming Han; Shaobin Zhuang; Weijia Mao; Xuefeng Hu; Ziyan Yang; Zhenheng Yang; Yali Wang; Huaibo Huang; Xiangyu Yue; Hao Chen

arXiv:2602.14041·cs.CV·March 16, 2026

BitDance: Scaling Autoregressive Generative Models with Binary Tokens

Yuang Ai, Jiaming Han, Shaobin Zhuang, Weijia Mao, Xuefeng Hu, Ziyan Yang, Zhenheng Yang, Yali Wang, Huaibo Huang, Xiangyu Yue, Hao Chen

PDF

Open Access 8 Models

TL;DR

BitDance introduces a scalable autoregressive image generation method using binary tokens and diffusion, achieving state-of-the-art results with fewer parameters and significantly faster inference, especially for high-resolution images.

Contribution

The paper proposes BitDance, a novel AR image generator using binary tokens and diffusion, enabling efficient high-resolution image synthesis with fewer parameters and faster speed.

Findings

01

Achieves an FID of 1.24 on ImageNet 256x256, the best among AR models.

02

Outperforms state-of-the-art parallel AR models with 5.4x fewer parameters.

03

Generates 1024x1024 images over 30x faster than prior AR models.

Abstract

We present BitDance, a scalable autoregressive (AR) image generator that predicts binary visual tokens instead of codebook indices. With high-entropy binary latents, BitDance lets each token represent up to $2^{256}$ states, yielding a compact yet highly expressive discrete representation. Sampling from such a huge token space is difficult with standard classification. To resolve this, BitDance uses a binary diffusion head: instead of predicting an index with softmax, it employs continuous-space diffusion to generate the binary tokens. Furthermore, we propose next-patch diffusion, a new decoding method that predicts multiple tokens in parallel with high accuracy, greatly speeding up inference. On ImageNet 256x256, BitDance achieves an FID of 1.24, the best among AR models. With next-patch diffusion, BitDance beats state-of-the-art parallel AR models that use 1.4B parameters, while using…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Cell Image Analysis Techniques · Handwritten Text Recognition Techniques