Relax Forcing: Relaxed KV-Memory for Consistent Long Video Generation
Zengqun Zhao, Yanzuo Lu, Ziquan Liu, Jifei Song, Jiankang Deng, Ioannis Patras

TL;DR
This paper introduces Relax Forcing, a structured temporal memory mechanism for autoregressive diffusion models, which improves long video generation by effectively utilizing relevant past information to enhance motion dynamics and temporal consistency.
Contribution
The paper proposes Relax Forcing, a novel structured memory approach that decomposes temporal context into functional roles, significantly improving long-horizon video generation quality.
Findings
Enhances motion dynamics and temporal consistency in long videos.
Reduces attention overhead during inference.
Structured memory outperforms dense history in long video synthesis.
Abstract
Autoregressive (AR) video diffusion has recently emerged as a promising paradigm for long video generation, enabling causal synthesis beyond the limits of bidirectional models. To address training-inference mismatch, a series of self-forcing strategies have been proposed to improve rollout stability by conditioning the model on its own predictions during training. While these approaches substantially mitigate exposure bias, extending generation to minute-scale horizons remains challenging due to progressive temporal degradation. In this work, we show that this limitation is not primarily caused by insufficient memory, but by how temporal memory is utilised during inference. Through empirical analysis, we find that increasing memory does not consistently improve long-horizon generation, and that the temporal placement of historical context significantly influences motion dynamics while…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Human Motion and Animation · Virtual Reality Applications and Impacts
