Mirai: Autoregressive Visual Generation Needs Foresight
Yonghao Yu, Lang Huang, Zerun Wang, Runyi Li, Toshihiko Yamasaki

TL;DR
Mirai introduces a foresight-based training framework for autoregressive visual generators, significantly enhancing convergence speed and image quality without architectural changes.
Contribution
Mirai demonstrates how injecting future information aligned with internal representations improves autoregressive image generation.
Findings
Mirai accelerates convergence of LlamaGen-B by up to 10×.
Mirai reduces ImageNet generation FID from 5.34 to 4.34.
Foresight aligned with internal representations enhances causal modeling.
Abstract
Autoregressive (AR) visual generators model images as sequences of discrete tokens and are trained with a next-token likelihood objective. This strict causal supervision optimizes each step based only on the immediate next token, which can weaken global coherence and slow convergence. We investigate whether foresight, training signals that originate from later tokens, can improve autoregressive visual generation. We conduct a series of controlled diagnostics along the injection level, foresight layout, and foresight source axes, revealing a key insight: aligning foresight with AR models' internal representations on the 2D image grid improves causal modeling. We formulate this insight with Mirai (meaning "future" in Japanese), a general framework that injects future information into AR training with no architecture change and no extra inference overhead: Mirai-E uses explicit foresight…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
