End-to-End Training for Autoregressive Video Diffusion via Self-Resampling

Yuwei Guo; Ceyuan Yang; Hao He; Yang Zhao; Meng Wei; Zhenheng Yang; Weilin Huang; Dahua Lin

arXiv:2512.15702·cs.CV·December 18, 2025

End-to-End Training for Autoregressive Video Diffusion via Self-Resampling

Yuwei Guo, Ceyuan Yang, Hao He, Yang Zhao, Meng Wei, Zhenheng Yang, Weilin Huang, Dahua Lin

PDF

Open Access

TL;DR

This paper introduces Resampling Forcing, an end-to-end, teacher-free training framework for autoregressive video diffusion models that improves temporal consistency and scalability by simulating inference errors and dynamically retrieving relevant history frames.

Contribution

The paper proposes a novel self-resampling scheme and history routing mechanism enabling scalable, end-to-end training of autoregressive video diffusion models without external teachers.

Findings

01

Achieves comparable performance to distillation-based methods.

02

Exhibits superior temporal consistency on longer videos.

03

Supports efficient long-horizon video generation.

Abstract

Autoregressive video diffusion models hold promise for world simulation but are vulnerable to exposure bias arising from the train-test mismatch. While recent works address this via post-training, they typically rely on a bidirectional teacher model or online discriminator. To achieve an end-to-end solution, we introduce Resampling Forcing, a teacher-free framework that enables training autoregressive video models from scratch and at scale. Central to our approach is a self-resampling scheme that simulates inference-time model errors on history frames during training. Conditioned on these degraded histories, a sparse causal mask enforces temporal causality while enabling parallel training with frame-level diffusion loss. To facilitate efficient long-horizon generation, we further introduce history routing, a parameter-free mechanism that dynamically retrieves the top-k most relevant…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Human Pose and Action Recognition · Domain Adaptation and Few-Shot Learning