Causal Forcing++: Scalable Few-Step Autoregressive Diffusion Distillation for Real-Time Interactive Video Generation

Min Zhao; Hongzhou Zhu; Kaiwen Zheng; Zihan Zhou; Bokai Yan; Xinyuan Li; Xiao Yang; Chongxuan Li; Jun Zhu

arXiv:2605.15141·cs.CV·May 15, 2026

Causal Forcing++: Scalable Few-Step Autoregressive Diffusion Distillation for Real-Time Interactive Video Generation

Min Zhao, Hongzhou Zhu, Kaiwen Zheng, Zihan Zhou, Bokai Yan, Xinyuan Li, Xiao Yang, Chongxuan Li, Jun Zhu

PDF

2 Repos

TL;DR

This paper introduces Causal Forcing++, a scalable method for real-time interactive video generation that significantly reduces latency and improves quality in few-step autoregressive diffusion models.

Contribution

It proposes causal consistency distillation for efficient, scalable initialization of few-step AR models, surpassing state-of-the-art methods in speed and quality.

Findings

01

Outperforms SOTA 4-step chunk-wise Causal Forcing in 2-step frame-wise setting.

02

Reduces first-frame latency by 50%.

03

Cuts Stage 2 training cost by approximately 4 times.

Abstract

Real-time interactive video generation requires low-latency, streaming, and controllable rollout. Existing autoregressive (AR) diffusion distillation methods have achieved strong results in the chunk-wise 4-step regime by distilling bidirectional base models into few-step AR students, but they remain limited by coarse response granularity and non-negligible sampling latency. In this paper, we study a more aggressive setting: frame-wise autoregression with only 1--2 sampling steps. In this regime, we identify the initialization of a few-step AR student as the key bottleneck: existing strategies are either target-misaligned, incapable of few-step generation, or too costly to scale. We propose \textbf{Causal Forcing++}, a principled and scalable pipeline that uses \emph{causal consistency distillation} (causal CD) for few-step AR initialization. The core idea is that causal CD learns the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.