Mirai: Autoregressive Visual Generation Needs Foresight

Yonghao Yu; Lang Huang; Zerun Wang; Runyi Li; Toshihiko Yamasaki

arXiv:2601.14671·cs.CV·April 14, 2026

Mirai: Autoregressive Visual Generation Needs Foresight

Yonghao Yu, Lang Huang, Zerun Wang, Runyi Li, Toshihiko Yamasaki

PDF

TL;DR

Mirai introduces a foresight-based training framework for autoregressive visual generators, significantly enhancing convergence speed and image quality without architectural changes.

Contribution

Mirai demonstrates how injecting future information aligned with internal representations improves autoregressive image generation.

Findings

01

Mirai accelerates convergence of LlamaGen-B by up to 10×.

02

Mirai reduces ImageNet generation FID from 5.34 to 4.34.

03

Foresight aligned with internal representations enhances causal modeling.

Abstract

Autoregressive (AR) visual generators model images as sequences of discrete tokens and are trained with a next-token likelihood objective. This strict causal supervision optimizes each step based only on the immediate next token, which can weaken global coherence and slow convergence. We investigate whether foresight, training signals that originate from later tokens, can improve autoregressive visual generation. We conduct a series of controlled diagnostics along the injection level, foresight layout, and foresight source axes, revealing a key insight: aligning foresight with AR models' internal representations on the 2D image grid improves causal modeling. We formulate this insight with Mirai (meaning "future" in Japanese), a general framework that injects future information into AR training with no architecture change and no extra inference overhead: Mirai-E uses explicit foresight…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.