Causal Forcing: Autoregressive Diffusion Distillation Done Right for High-Quality Real-Time Interactive Video Generation

Hongzhou Zhu; Min Zhao; Guande He; Hang Su; Chongxuan Li; Jun Zhu

arXiv:2602.02214·cs.CV·May 22, 2026

Causal Forcing: Autoregressive Diffusion Distillation Done Right for High-Quality Real-Time Interactive Video Generation

Hongzhou Zhu, Min Zhao, Guande He, Hang Su, Chongxuan Li, Jun Zhu

PDF

1 Repo 2 Models

TL;DR

This paper introduces Causal Forcing, a novel method for distilling autoregressive video diffusion models that bridges architectural gaps and achieves superior real-time video generation quality.

Contribution

It proposes Causal Forcing, a new distillation technique that effectively bridges the gap between bidirectional and autoregressive models for high-quality video synthesis.

Findings

01

Outperforms all baselines across all metrics.

02

Surpasses the SOTA Self Forcing by 19.3% in Dynamic Degree.

03

Achieves 8.7% improvement in VisionReward and 16.7% in Instruction Following.

Abstract

To achieve real-time interactive video generation, current methods distill pretrained bidirectional video diffusion models into few-step autoregressive (AR) models, facing an architectural gap when full attention is replaced by causal attention. However, existing approaches do not bridge this gap theoretically. They initialize the AR student via ODE distillation, which requires frame-level injectivity, where each noisy frame must map to a unique clean frame under the PF-ODE of an AR teacher. Distilling an AR student from a bidirectional teacher violates this condition, preventing recovery of the teacher's flow map and instead inducing a conditional-expectation solution, which degrades performance. To address this issue, we propose Causal Forcing, which uses an autoregressive teacher for ODE initialization to bridge the architectural gap, and then applies the same DMD procedure as in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

thu-ml/Causal-Forcing
github

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Vision and Imaging · Human Motion and Animation