TL;DR
DySink is a retrieval-based framework for autoregressive long video generation that adaptively selects relevant historical frames as dynamic sinks, improving temporal quality and reducing collapse.
Contribution
It introduces DySink, a novel retrieval-based method with an anomaly gate to enhance long video generation by maintaining adaptive, relevant context.
Findings
DySink improves dynamic degree over baselines.
DySink achieves higher temporal quality.
Code available at https://github.com/yebo0216best/DySink.
Abstract
Autoregressive long video generation often adopts bounded-memory streaming for efficiency, typically combining local windows for short-term continuity with static early-frame sinks as long-range anchors. However, this fixed allocation keeps early frames cached even when the current visual state has substantially diverged from them, while discarding potentially more relevant intermediate history. As a result, the retained long-range context may become less adaptive and bias generation toward outdated cues; in severe cases, RoPE-induced phase re-alignment can homogenize inter-head attention and cause sink collapse, where content regresses toward sink frames. We propose DySink, a retrieval-based framework that maintains a compact memory bank and selects visually relevant historical frames as dynamic frame sinks. DySink couples adaptive retrieval with a sink anomaly gate, which detects…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
