ContextAnyone: Context-Aware Diffusion for Character-Consistent Text-to-Video Generation

Ziyang Mai; Yu-Wing Tai

arXiv:2512.07328·cs.CV·December 9, 2025

ContextAnyone: Context-Aware Diffusion for Character-Consistent Text-to-Video Generation

Ziyang Mai, Yu-Wing Tai

PDF

Open Access

TL;DR

ContextAnyone introduces a novel context-aware diffusion framework that ensures character identity consistency across video scenes by integrating reference information effectively into text-to-video generation.

Contribution

It presents a new diffusion-based approach with an Emphasize-Attention module and dual-guidance loss for improved character consistency and visual quality in text-to-video synthesis.

Findings

01

Outperforms existing methods in identity consistency

02

Generates coherent videos with diverse motions and scenes

03

Enhances visual fidelity through novel model components

Abstract

Text-to-video (T2V) generation has advanced rapidly, yet maintaining consistent character identities across scenes remains a major challenge. Existing personalization methods often focus on facial identity but fail to preserve broader contextual cues such as hairstyle, outfit, and body shape, which are critical for visual coherence. We propose \textbf{ContextAnyone}, a context-aware diffusion framework that achieves character-consistent video generation from text and a single reference image. Our method jointly reconstructs the reference image and generates new video frames, enabling the model to fully perceive and utilize reference information. Reference information is effectively integrated into a DiT-based diffusion backbone through a novel Emphasize-Attention module that selectively reinforces reference-aware features and prevents identity drift across frames. A dual-guidance loss…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Face recognition and analysis · Multimodal Machine Learning Applications