Rethinking Discrete Tokens: Treating Them as Conditions for Continuous Autoregressive Image Synthesis
Peng Zheng, Junke Wang, Yi Chang, Yizhou Yu, Rui Ma, and Zuxuan Wu

TL;DR
DisCon introduces a novel framework that treats discrete tokens as conditions for continuous image representations, improving image generation quality by avoiding quantization loss and modeling challenges.
Contribution
The paper proposes DisCon, a new approach that models continuous representations conditioned on discrete tokens, enhancing autoregressive image synthesis without quantization-induced information loss.
Findings
Achieves a gFID score of 1.38 on ImageNet 256x256
Outperforms state-of-the-art autoregressive models
Effectively models continuous representations conditioned on discrete tokens
Abstract
Recent advances in large language models (LLMs) have spurred interests in encoding images as discrete tokens and leveraging autoregressive (AR) frameworks for visual generation. However, the quantization process in AR-based visual generation models inherently introduces information loss that degrades image fidelity. To mitigate this limitation, recent studies have explored to autoregressively predict continuous tokens. Unlike discrete tokens that reside in a structured and bounded space, continuous representations exist in an unbounded, high-dimensional space, making density estimation more challenging and increasing the risk of generating out-of-distribution artifacts. Based on the above findings, this work introduces DisCon (Discrete-Conditioned Continuous Autoregressive Model), a novel framework that reinterprets discrete tokens as conditional signals rather than generation targets.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Retrieval and Classification Techniques
