Distilled Decoding 2: One-step Sampling of Image Auto-regressive Models with Conditional Score Distillation

Enshu Liu; Qian Chen; Xuefei Ning; Shengen Yan; Guohao Dai; Zinan Lin; Yu Wang

arXiv:2510.21003·cs.LG·October 27, 2025

Distilled Decoding 2: One-step Sampling of Image Auto-regressive Models with Conditional Score Distillation

Enshu Liu, Qian Chen, Xuefei Ning, Shengen Yan, Guohao Dai, Zinan Lin, Yu Wang

PDF

TL;DR

This paper introduces DD2, a novel method enabling one-step sampling in image auto-regressive models without relying on pre-defined mappings, significantly improving sampling speed while maintaining high image quality.

Contribution

DD2 advances one-step sampling for image AR models by using conditional score distillation, eliminating the need for pre-defined mappings and reducing performance gap.

Findings

01

Enables one-step sampling with minimal FID increase from 3.40 to 5.43 on ImageNet-256.

02

Reduces the gap between one-step sampling and original AR models by 67%.

03

Achieves up to 12.3× training speed-up.

Abstract

Image Auto-regressive (AR) models have emerged as a powerful paradigm of visual generative models. Despite their promising performance, they suffer from slow generation speed due to the large number of sampling steps required. Although Distilled Decoding 1 (DD1) was recently proposed to enable few-step sampling for image AR models, it still incurs significant performance degradation in the one-step setting, and relies on a pre-defined mapping that limits its flexibility. In this work, we propose a new method, Distilled Decoding 2 (DD2), to further advances the feasibility of one-step sampling for image AR models. Unlike DD1, DD2 does not without rely on a pre-defined mapping. We view the original AR model as a teacher model which provides the ground truth conditional score in the latent embedding space at each token position. Based on this, we propose a novel \emph{conditional score…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.