Deep Forcing: Training-Free Long Video Generation with Deep Sink and Participative Compression

Jung Yi; Wooseok Jang; Paul Hyunbin Cho; Jisu Nam; Heeji Yoon; Seungryong Kim

arXiv:2512.05081·cs.CV·December 5, 2025

Deep Forcing: Training-Free Long Video Generation with Deep Sink and Participative Compression

Jung Yi, Wooseok Jang, Paul Hyunbin Cho, Jisu Nam, Heeji Yoon, Seungryong Kim

PDF

Open Access

TL;DR

This paper introduces Deep Forcing, a training-free method for long video generation that stabilizes and improves quality during extended autoregressive streaming, surpassing existing methods in fidelity and consistency.

Contribution

Deep Forcing presents two novel training-free mechanisms, Deep Sink and Participative Compression, to enhance long video generation without fine-tuning.

Findings

01

Achieves over 12x extrapolation in video length with quality improvements.

02

Outperforms existing methods like LongLive and RollingForcing in quality and consistency.

03

Maintains real-time generation while significantly extending video duration.

Abstract

Recent advances in autoregressive video diffusion have enabled real-time frame streaming, yet existing solutions still suffer from temporal repetition, drift, and motion deceleration. We find that naively applying StreamingLLM-style attention sinks to video diffusion leads to fidelity degradation and motion stagnation. To overcome this, we introduce Deep Forcing, which consists of two training-free mechanisms that address this without any fine-tuning. Specifically, 1) Deep Sink dedicates half of the sliding window to persistent sink tokens and re-aligns their temporal RoPE phase to the current timeline, stabilizing global context during long rollouts. 2) Participative Compression performs importance-aware KV cache pruning that preserves only tokens actively participating in recent attention while safely discarding redundant and degraded history, minimizing error accumulation under…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage and Video Quality Assessment · Generative Adversarial Networks and Image Synthesis · Image Enhancement Techniques