TL;DR
This paper introduces IAMFlow, a training-free, identity-aware memory framework for long video generation that maintains consistent entity identities across prompts, improving quality and speed.
Contribution
It proposes a novel explicit entity tracking method using LLMs and VLMs, along with a new benchmark for narrative streaming video generation.
Findings
IAMFlow outperforms baselines by 2.56 points on NarraStream-Bench.
It achieves a 1.39× speedup over the most efficient baseline.
The framework effectively maintains entity identities across prompts.
Abstract
Autoregressive video generation has improved rapidly in visual fidelity and interactivity, but it still suffers from long-term inconsistency and memory degradation. Most existing solutions either compress historical frames using predefined strategies or retrieve keyframes based on coarse implicit attention signals, both of which fail to handle evolving prompts with shifting entity references, leading to identity drift, character duplication, and attribute loss. To address this, we propose IAMFlow, a training-free identity-aware memory framework that explicitly models and tracks persistent entity identities, enabling consistent generation across prompt transitions. Specifically, an LLM extracts entities with visual attributes from each prompt and assigns unique global IDs for identity-aware memory, while a VLM asynchronously verifies and refines attributes from rendered frames, enabling…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
