Beyond Next-Token: Next-X Prediction for Autoregressive Visual   Generation

Sucheng Ren; Qihang Yu; Ju He; Xiaohui Shen; Alan Yuille; Liang-Chieh; Chen

arXiv:2502.20388·cs.CV·March 24, 2025

Beyond Next-Token: Next-X Prediction for Autoregressive Visual Generation

Sucheng Ren, Qihang Yu, Ju He, Xiaohui Shen, Alan Yuille, Liang-Chieh, Chen

PDF

Open Access 1 Repo 1 Models

TL;DR

This paper introduces xAR, a flexible autoregressive framework for visual generation that extends token prediction to entities, improving modeling granularity and reducing exposure bias, resulting in faster and more accurate image synthesis.

Contribution

xAR generalizes the notion of tokens to entities, reformulates classification as continuous regression, and employs noisy context training to mitigate exposure bias in visual autoregressive models.

Findings

01

xAR outperforms larger models on ImageNet-256 with faster inference.

02

xAR-H achieves a new state-of-the-art FID of 1.24.

03

xAR is 20 times faster than previous models while maintaining high quality.

Abstract

Autoregressive (AR) modeling, known for its next-token prediction paradigm, underpins state-of-the-art language and visual generative models. Traditionally, a ``token'' is treated as the smallest prediction unit, often a discrete symbol in language or a quantized patch in vision. However, the optimal token definition for 2D image structures remains an open question. Moreover, AR models suffer from exposure bias, where teacher forcing during training leads to error accumulation at inference. In this paper, we propose xAR, a generalized AR framework that extends the notion of a token to an entity X, which can represent an individual patch token, a cell (a $k \times k$ grouping of neighboring patches), a subsample (a non-local grouping of distant patches), a scale (coarse-to-fine resolution), or even a whole image. Additionally, we reformulate discrete token classification as continuous…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

OliverRensu/xAR
pytorchOfficial

Models

🤗
OliverRen/xAR
model· ♡ 4
♡ 4

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning

MethodsBalanced Selection