Knot Forcing: Taming Autoregressive Video Diffusion Models for Real-time Infinite Interactive Portrait Animation
Steven Xiao, Xindi Zhang, Dechao Meng, Qi Wang, Peng Zhang, Bang Zhang

TL;DR
Knot Forcing introduces a real-time streaming framework for portrait animation that maintains high visual fidelity and temporal coherence through chunk-wise generation, a temporal knot module, and a dynamic reference update mechanism.
Contribution
The paper presents a novel streaming framework with innovative modules to enable real-time, high-quality, and long-term coherent portrait animation from autoregressive video diffusion models.
Findings
Achieves real-time performance on consumer GPUs.
Maintains long-term temporal coherence in portrait animation.
Produces high-fidelity, visually stable animated portraits.
Abstract
Real-time portrait animation is essential for interactive applications such as virtual assistants and live avatars, requiring high visual fidelity, temporal coherence, ultra-low latency, and responsive control from dynamic inputs like reference images and driving signals. While diffusion-based models achieve strong quality, their non-causal nature hinders streaming deployment. Causal autoregressive video generation approaches enable efficient frame-by-frame generation but suffer from error accumulation, motion discontinuities at chunk boundaries, and degraded long-term consistency. In this work, we present a novel streaming framework named Knot Forcing for real-time portrait animation that addresses these challenges through three key designs: (1) a chunk-wise generation strategy with global identity preservation via cached KV states of the reference image and local temporal modeling…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Computer Graphics and Visualization Techniques · Human Motion and Animation
