TL;DR
Drift-AR introduces a unified entropy-based approach to accelerate autoregressive diffusion models, enabling single-step decoding with significant speedup while maintaining high quality.
Contribution
The paper proposes a novel entropy-informed framework that accelerates both AR and diffusion stages in vision models, achieving 3.8-5.5× speedup with one-step decoding.
Findings
Achieves 3.8-5.5× speedup over existing methods.
Enables genuine 1-NFE (single-step) decoding without quality loss.
Demonstrates effectiveness on MAR, TransDiff, and NextStep-1 benchmarks.
Abstract
Autoregressive (AR)-Diffusion hybrid paradigms combine AR's structured semantic modeling with diffusion's high-fidelity synthesis, yet suffer from a dual speed bottleneck: the sequential AR stage and the iterative multi-step denoising of the diffusion vision decode stage. Existing methods address each in isolation without a unified principle design. We observe that the per-position \emph{prediction entropy} of continuous-space AR models naturally encodes spatially varying generation uncertainty, which simultaneously governing draft prediction quality in the AR stage and reflecting the corrective effort required by vision decoding stage, which is not fully explored before. Since entropy is inherently tied to both bottlenecks, it serves as a natural unifying signal for joint acceleration. In this work, we propose \textbf{Drift-AR}, which leverages entropy signal to accelerate both stages:…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
