Inject Where It Matters: Training-Free Spatially-Adaptive Identity Preservation for Text-to-Image Personalization

Guandong Li; Mengxia Ye

arXiv:2602.13994·cs.CV·February 17, 2026

Inject Where It Matters: Training-Free Spatially-Adaptive Identity Preservation for Text-to-Image Personalization

Guandong Li, Mengxia Ye

PDF

Open Access

TL;DR

SpatialID is a training-free method that adaptively injects identity features into specific regions during text-to-image generation, improving identity preservation and background consistency without fine-tuning.

Contribution

It introduces a spatially-adaptive identity modulation framework and a dynamic scheduling strategy, addressing background contamination issues in personalization.

Findings

01

Achieves state-of-the-art text adherence and visual consistency.

02

Significantly reduces background contamination in generated images.

03

Maintains robust identity preservation without fine-tuning.

Abstract

Personalized text-to-image generation aims to integrate specific identities into arbitrary contexts. However, existing tuning-free methods typically employ Spatially Uniform Visual Injection, causing identity features to contaminate non-facial regions (e.g., backgrounds and lighting) and degrading text adherence. To address this without expensive fine-tuning, we propose SpatialID, a training-free spatially-adaptive identity modulation framework. SpatialID fundamentally decouples identity injection into face-relevant and context-free regions using a Spatial Mask Extractor derived from cross-attention responses. Furthermore, we introduce a Temporal-Spatial Scheduling strategy that dynamically adjusts spatial constraints - transitioning from Gaussian priors to attention-based masks and adaptive relaxation - to align with the diffusion generation dynamics. Extensive experiments on IBench…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Visual Attention and Saliency Detection · Face recognition and analysis