BitDance: Scaling Autoregressive Generative Models with Binary Tokens
Yuang Ai, Jiaming Han, Shaobin Zhuang, Weijia Mao, Xuefeng Hu, Ziyan Yang, Zhenheng Yang, Yali Wang, Huaibo Huang, Xiangyu Yue, Hao Chen

TL;DR
BitDance introduces a scalable autoregressive image generation method using binary tokens and diffusion, achieving state-of-the-art results with fewer parameters and significantly faster inference, especially for high-resolution images.
Contribution
The paper proposes BitDance, a novel AR image generator using binary tokens and diffusion, enabling efficient high-resolution image synthesis with fewer parameters and faster speed.
Findings
Achieves an FID of 1.24 on ImageNet 256x256, the best among AR models.
Outperforms state-of-the-art parallel AR models with 5.4x fewer parameters.
Generates 1024x1024 images over 30x faster than prior AR models.
Abstract
We present BitDance, a scalable autoregressive (AR) image generator that predicts binary visual tokens instead of codebook indices. With high-entropy binary latents, BitDance lets each token represent up to states, yielding a compact yet highly expressive discrete representation. Sampling from such a huge token space is difficult with standard classification. To resolve this, BitDance uses a binary diffusion head: instead of predicting an index with softmax, it employs continuous-space diffusion to generate the binary tokens. Furthermore, we propose next-patch diffusion, a new decoding method that predicts multiple tokens in parallel with high accuracy, greatly speeding up inference. On ImageNet 256x256, BitDance achieves an FID of 1.24, the best among AR models. With next-patch diffusion, BitDance beats state-of-the-art parallel AR models that use 1.4B parameters, while using…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗shallowdream204/BitDance-14B-16xmodel· 50 dl· ♡ 8950 dl♡ 89
- 🤗shallowdream204/BitDance-14B-64xmodel· 84 dl· ♡ 6384 dl♡ 63
- 🤗shallowdream204/BitDance-ImageNetmodel· ♡ 1♡ 1
- 🤗shallowdream204/BitDance-Tokenizermodel· ♡ 1♡ 1
- 🤗BiliSakura/BitDance-14B-16x-diffusersmodel· 53 dl· ♡ 853 dl♡ 8
- 🤗BiliSakura/BitDance-14B-64x-diffusersmodel· 14 dl· ♡ 514 dl♡ 5
- 🤗BiliSakura/BitDance-Tokenizer-diffusersmodel
- 🤗BiliSakura/BitDance-ImageNet-diffusersmodel
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Cell Image Analysis Techniques · Handwritten Text Recognition Techniques
