Autoregressive Visual Generation Needs a Prologue

Bowen Zheng; Weijian Luo; Guang Yang; Colin Zhang; Tianyang Hu

arXiv:2605.06137·cs.CV·May 8, 2026

Autoregressive Visual Generation Needs a Prologue

Bowen Zheng, Weijian Luo, Guang Yang, Colin Zhang, Tianyang Hu

PDF

1 Models

TL;DR

Prologue introduces a method to improve autoregressive image generation by adding a small set of learned prologue tokens, enhancing generation quality without compromising reconstruction fidelity.

Contribution

The paper proposes a decoupled approach using prologue tokens trained separately, enabling better generation quality while maintaining reconstruction performance.

Findings

01

Prologue reduces gFID from 21.01 to 10.75 on ImageNet 256x256.

02

Prologue-Large achieves rFID of 0.99 and gFID of 1.46 without auxiliary supervision.

03

Prologue tokens exhibit emergent semantic structure, with high linear probing accuracy.

Abstract

In this work, we propose Prologue, an approach to bridging the reconstruction-generation gap in autoregressive (AR) image generation. Instead of modifying visual tokens to satisfy both reconstruction and generation, Prologue generates a small set of prologue tokens prepended to the visual token sequence. These prologue tokens are trained exclusively with the AR cross-entropy (CE) loss, while visual tokens remain dedicated to reconstruction. This decoupled design lets us optimize generation through the AR model's true distribution without affecting reconstruction quality, which we further formalize from an ELBO perspective. On ImageNet 256x256, Prologue-Base reduces gFID from 21.01 to 10.75 without classifier-free guidance while keeping reconstruction almost unchanged; Prologue-Large reaches a competitive rFID of 0.99 and gFID of 1.46 using a standard AR model without auxiliary semantic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
Zyriix/prologue
model

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.