S$^2$Edit: Text-Guided Image Editing with Precise Semantic and Spatial Control

Xudong Liu; Zikun Chen; Ruowei Jiang; Ziyi Wu; Kejia Yin; Han Zhao; Parham Aarabi; Igor Gilitschenski

arXiv:2507.04584·cs.CV·December 29, 2025

S$^2$Edit: Text-Guided Image Editing with Precise Semantic and Spatial Control

Xudong Liu, Zikun Chen, Ruowei Jiang, Ziyi Wu, Kejia Yin, Han Zhao, Parham Aarabi, Igor Gilitschenski

PDF

Open Access

TL;DR

S$^2$Edit introduces a novel text-guided image editing method that achieves precise semantic and spatial control, preserving identity and details during localized edits by disentangling identity from attributes and guiding edits with object masks.

Contribution

The paper presents a new approach to personalized image editing with fine-grained control using a pre-trained diffusion model, including identity embedding, disentanglement, and mask-guided localization.

Findings

01

Outperforms state-of-the-art methods quantitatively and qualitatively.

02

Enables localized, identity-preserving edits with semantic and spatial precision.

03

Demonstrates applications like makeup transfer with high fidelity.

Abstract

Recent advances in diffusion models have enabled high-quality generation and manipulation of images guided by texts, as well as concept learning from images. However, naive applications of existing methods to editing tasks that require fine-grained control, e.g., face editing, often lead to suboptimal solutions with identity information and high-frequency details lost during the editing process, or irrelevant image regions altered due to entangled concepts. In this work, we propose S $^{2}$ Edit, a novel method based on a pre-trained text-to-image diffusion model that enables personalized editing with precise semantic and spatial control. We first fine-tune our model to embed the identity information into a learnable text token. During fine-tuning, we disentangle the learned identity token from attributes to be edited by enforcing an orthogonality constraint in the textual feature space. To…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques

MethodsDiffusion