Autoregressive Image Generation with Masked Bit Modeling
Qihang Yu, Qihao Liu, Ju He, Xinyang Zhang, Yang Liu, Liang-Chieh Chen, Xi Chen

TL;DR
This paper introduces masked Bit AutoRegressive modeling (BAR), a scalable discrete image generation framework that bridges the performance gap with continuous methods by leveraging larger codebooks and bit-level autoregressive prediction, achieving state-of-the-art results.
Contribution
The paper proposes BAR, a novel scalable discrete generation method using masked bit modeling with autoregressive transformers, outperforming existing approaches in image quality and efficiency.
Findings
BAR achieves a gFID of 0.99 on ImageNet-256.
Discrete tokenizers with larger codebooks can match continuous methods.
BAR reduces sampling costs and converges faster than prior approaches.
Abstract
This paper challenges the dominance of continuous pipelines in visual generation. We systematically investigate the performance gap between discrete and continuous methods. Contrary to the belief that discrete tokenizers are intrinsically inferior, we demonstrate that the disparity arises primarily from the total number of bits allocated in the latent space (i.e., the compression ratio). We show that scaling up the codebook size effectively bridges this gap, allowing discrete tokenizers to match or surpass their continuous counterparts. However, existing discrete generation methods struggle to capitalize on this insight, suffering from performance degradation or prohibitive training costs with scaled codebook. To address this, we propose masked Bit AutoRegressive modeling (BAR), a scalable framework that supports arbitrary codebook sizes. By equipping an autoregressive transformer with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Image Enhancement Techniques · Advanced Data Compression Techniques
