AutoRefiner: Improving Autoregressive Video Diffusion Models via Reflective Refinement Over the Stochastic Sampling Path

Zhengyang Yu; Akio Hayakawa; Masato Ishii; Qingtao Yu; Takashi Shibuya; Jing Zhang; Yuki Mitsufuji

arXiv:2512.11203·cs.CV·December 16, 2025

AutoRefiner: Improving Autoregressive Video Diffusion Models via Reflective Refinement Over the Stochastic Sampling Path

Zhengyang Yu, Akio Hayakawa, Masato Ishii, Qingtao Yu, Takashi Shibuya, Jing Zhang, Yuki Mitsufuji

PDF

Open Access

TL;DR

AutoRefiner introduces a novel noise refinement method tailored for autoregressive video diffusion models, significantly improving sample fidelity during stochastic sampling without increasing computational costs.

Contribution

The paper proposes AutoRefiner, a new noise refiner specifically designed for AR-VDMs, incorporating pathwise refinement and a reflective KV-cache to enhance sample quality efficiently.

Findings

01

AutoRefiner improves sample fidelity in AR-VDMs.

02

It operates as an efficient plug-in without extra training.

03

Enhanced video sample quality demonstrated in experiments.

Abstract

Autoregressive video diffusion models (AR-VDMs) show strong promise as scalable alternatives to bidirectional VDMs, enabling real-time and interactive applications. Yet there remains room for improvement in their sample fidelity. A promising solution is inference-time alignment, which optimizes the noise space to improve sample fidelity without updating model parameters. Yet, optimization- or search-based methods are computationally impractical for AR-VDMs. Recent text-to-image (T2I) works address this via feedforward noise refiners that modulate sampled noises in a single forward pass. Can such noise refiners be extended to AR-VDMs? We identify the failure of naively extending T2I noise refiners to AR-VDMs and propose AutoRefiner-a noise refiner tailored for AR-VDMs, with two key designs: pathwise noise refinement and a reflective KV-cache. Experiments demonstrate that AutoRefiner…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Image Enhancement Techniques · Image and Video Quality Assessment