ASemConsist: Adaptive Semantic Feature Control for Training-Free Identity-Consistent Generation
Shin Seong Kim, Minjung Shin, Hyunin Cho, Youngjung Uh

TL;DR
ASemConsist introduces a novel, training-free method for maintaining consistent character identity across generated images in text-to-image diffusion models by selectively modifying semantic features and evaluating identity preservation.
Contribution
It proposes a new semantic control framework that improves identity consistency without sacrificing prompt alignment, using adaptive feature sharing and a unified evaluation protocol.
Findings
Achieves state-of-the-art identity consistency in image sequences.
Effectively balances identity preservation and prompt alignment.
Introduces the CQS metric for comprehensive evaluation.
Abstract
Recent text-to-image diffusion models have significantly improved visual quality and text alignment. However, generating a sequence of images while preserving consistent character identity across diverse scene descriptions remains a challenging task. Existing methods often struggle with a trade-off between maintaining identity consistency and ensuring per-image prompt alignment. In this paper, we introduce a novel framework, ASemconsist, that addresses this challenge through selective text embedding modification, enabling explicit semantic control over character identity without sacrificing prompt alignment. Furthermore, based on our analysis of padding embeddings in FLUX, we propose a semantic control strategy that repurposes padding embeddings as semantic containers. Additionally, we introduce an adaptive feature-sharing strategy that automatically evaluates textual ambiguity and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Face recognition and analysis
