Context Forcing: Consistent Autoregressive Video Generation with Long Context

Shuo Chen; Cong Wei; Sun Sun; Ping Nie; Kai Zhou; Ge Zhang; Ming-Hsuan Yang; Wenhu Chen

arXiv:2602.06028·cs.CV·February 6, 2026

Context Forcing: Consistent Autoregressive Video Generation with Long Context

Shuo Chen, Cong Wei, Sun Sun, Ping Nie, Kai Zhou, Ge Zhang, Ming-Hsuan Yang, Wenhu Chen

PDF

Open Access

TL;DR

This paper introduces Context Forcing, a novel training framework for long video generation that uses a long-context teacher to improve global temporal consistency and extends effective context length well beyond current methods.

Contribution

It proposes a long-context training approach with a Slow-Fast Memory system to enable long-term video generation, addressing the student-teacher mismatch in existing methods.

Findings

01

Enables effective context lengths exceeding 20 seconds.

02

Outperforms state-of-the-art methods like LongLive and Infinite-RoPE.

03

Maintains superior long-term consistency in generated videos.

Abstract

Recent approaches to real-time long video generation typically employ streaming tuning strategies, attempting to train a long-context student using a short-context (memoryless) teacher. In these frameworks, the student performs long rollouts but receives supervision from a teacher limited to short 5-second windows. This structural discrepancy creates a critical \textbf{student-teacher mismatch}: the teacher's inability to access long-term history prevents it from guiding the student on global temporal dependencies, effectively capping the student's context length. To resolve this, we propose \textbf{Context Forcing}, a novel framework that trains a long-context student via a long-context teacher. By ensuring the teacher is aware of the full generation history, we eliminate the supervision mismatch, enabling the robust training of models capable of long-term consistency. To make this…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Vision and Imaging · Human Pose and Action Recognition