Rethinking Training Dynamics in Scale-wise Autoregressive Generation

Gengze Zhou; Chongjian Ge; Hao Tan; Feng Liu; Yicong Hong

arXiv:2512.06421·cs.CV·December 9, 2025

Rethinking Training Dynamics in Scale-wise Autoregressive Generation

Gengze Zhou, Chongjian Ge, Hao Tan, Feng Liu, Yicong Hong

PDF

Open Access 1 Models

TL;DR

This paper identifies key training challenges in scale-wise autoregressive models and proposes Self-Autoregressive Refinement (SAR), a method that improves generation quality by aligning training and inference through lightweight rollouts and contrastive supervision.

Contribution

The paper introduces SAR, a novel post-training technique combining Stagger-Scale Rollout and Contrastive Student-Forcing Loss to enhance autoregressive model training and generation quality.

Findings

01

SAR reduces FID by 5.2% on ImageNet 256 within 10 epochs

02

SAR improves generation quality with minimal computational overhead

03

The method is scalable and effective as a post-training enhancement

Abstract

Recent advances in autoregressive (AR) generative models have produced increasingly powerful systems for media synthesis. Among them, next-scale prediction has emerged as a popular paradigm, where models generate images in a coarse-to-fine manner. However, scale-wise AR models suffer from exposure bias, which undermines generation quality. We identify two primary causes of this issue: (1) train-test mismatch, where the model must rely on its own imperfect predictions during inference, and (2) imbalance in scale-wise learning difficulty, where certain scales exhibit disproportionately higher optimization complexity. Through a comprehensive analysis of training dynamics, we propose Self-Autoregressive Refinement (SAR) to address these limitations. SAR introduces a Stagger-Scale Rollout (SSR) mechanism that performs lightweight autoregressive rollouts to expose the model to its own…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
ZGZzz/SAR
model

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Image Enhancement Techniques · Face recognition and analysis