MixAR: Mixture Autoregressive Image Generation
Jinyuan Hu, Jiayou Zhang, Shaobo Cui, Kun Zhang, Guangyi Chen

TL;DR
MixAR introduces a novel mixture training framework that combines discrete tokens and continuous autoregressive modeling to improve image generation quality, addressing limitations of quantization and unstructured continuous spaces.
Contribution
The paper proposes MixAR, a new framework that integrates discrete token guidance with continuous autoregressive models for enhanced image generation.
Findings
DC-Mix offers a good balance between efficiency and fidelity.
TI-Mix improves training and inference consistency.
MixAR achieves higher quality image generation.
Abstract
Autoregressive (AR) approaches, which represent images as sequences of discrete tokens from a finite codebook, have achieved remarkable success in image generation. However, the quantization process and the limited codebook size inevitably discard fine-grained information, placing bottlenecks on fidelity. Motivated by this limitation, recent studies have explored autoregressive modeling in continuous latent spaces, which offers higher generation quality. Yet, unlike discrete tokens constrained by a fixed codebook, continuous representations lie in a vast and unstructured space, posing significant challenges for efficient autoregressive modeling. To address these challenges, we introduce MixAR, a novel framework that leverages mixture training paradigms to inject discrete tokens as prior guidance for continuous AR modeling. MixAR is a factorized formulation that leverages discrete tokens…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning · Face recognition and analysis
