Gloria: Consistent Character Video Generation via Content Anchors
Yuhang Yang, Fan Zhang, Huaijin Pi, Shuai Guo, Guowei Xu, Wei Zhai, Yang Cao, Zheng-Jun Zha

TL;DR
Gloria introduces a novel method for generating long-duration, multi-view consistent character videos using content anchors, addressing identity preservation and multi-reference conflicts.
Contribution
The paper proposes a content anchor-based framework with mechanisms to enhance consistency and scalability for character video generation.
Findings
Generated videos exceed 10 minutes in length.
Achieved superior identity and appearance consistency across views.
Outperformed existing methods in quality and consistency.
Abstract
Digital characters are central to modern media, yet generating character videos with long-duration, consistent multi-view appearance and expressive identity remains challenging. Existing approaches either provide insufficient context to preserve identity or leverage non-character-centric information as the memory, leading to suboptimal consistency. Recognizing that character video generation inherently resembles an outside-looking-in scenario. In this work, we propose representing the character visual attributes through a compact set of anchor frames. This design provides stable references for consistency, while reference-based video generation inherently faces challenges of copy-pasting and multi-reference conflicts. To address these, we introduce two mechanisms: Superset Content Anchoring, providing intra- and extra-training clip cues to prevent duplication, and RoPE as Weak…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
