Delta Forcing: Trust Region Steering for Interactive Autoregressive Video Generation
Yuheng Wu, Xiangbo Gao, Tianhao Chen, Xinghao Chen, Qing Yin, Zhengzhong Tu, and Dongman Lee

TL;DR
Delta Forcing introduces a trust region-based framework for autoregressive video generation, balancing responsiveness to new events with long-term temporal coherence by constraining teacher supervision.
Contribution
It proposes a novel trust region approach inspired by policy optimization to improve stability and reactivity in interactive video generation models.
Findings
Significantly improves temporal consistency in generated videos.
Maintains responsiveness to dynamic event conditions.
Outperforms existing methods in experimental evaluations.
Abstract
Interactive real-time autoregressive video generation is essential for applications such as content creation and world modeling, where visual content must adapt to dynamically evolving event conditions. A fundamental challenge lies in balancing reactivity and stability: models must respond promptly to new events while maintaining temporal coherence over long horizons. Existing approaches distill bidirectional models into autoregressive generators and further adapt them via streaming long tuning, yet often exhibit persistent drift after condition changes. We identify the cause as conditional bias, where the teacher may provide condition-aligned but trajectory-agnostic guidance, biasing generation toward locally valid yet globally inconsistent modes. Inspired by Trust Region Policy Optimization, we propose Delta Forcing, a simple yet effective framework that constrains unreliable teacher…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
