Training-Free Disentangled Text-Guided Image Editing via Sparse Latent Constraints
Mutiara Shabrina, Nova Kurnia Putri, Jefri Satria Ferdiansyah, Sabita Khansa Dewi, Novanto Yudistira

TL;DR
This paper proposes a training-free, disentangled image editing method guided by text, using sparse latent constraints to improve control and reduce attribute entanglement in face editing tasks.
Contribution
It introduces a sparsity-based regularization to the PPE framework, enhancing disentanglement and control in text-guided image editing without additional training.
Findings
Reduces unintended attribute changes in face editing.
Enforces more focused and controlled image edits.
Preserves facial identity during attribute modifications.
Abstract
Text-driven image manipulation often suffers from attribute entanglement, where modifying a target attribute (e.g., adding bangs) unintentionally alters other semantic properties such as identity or appearance. The Predict, Prevent, and Evaluate (PPE) framework addresses this issue by leveraging pre-trained vision-language models for disentangled editing. In this work, we analyze the PPE framework, focusing on its architectural components, including BERT-based attribute prediction and StyleGAN2-based image generation on the CelebA-HQ dataset. Through empirical analysis, we identify a limitation in the original regularization strategy, where latent updates remain dense and prone to semantic leakage. To mitigate this issue, we introduce a sparsity-based constraint using L1 regularization on latent space manipulation. Experimental results demonstrate that the proposed approach enforces…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Face recognition and analysis · Digital Media Forensic Detection
