Training-Free Disentangled Text-Guided Image Editing via Sparse Latent Constraints

Mutiara Shabrina; Nova Kurnia Putri; Jefri Satria Ferdiansyah; Sabita Khansa Dewi; Novanto Yudistira

arXiv:2512.21637·cs.CV·December 29, 2025

Training-Free Disentangled Text-Guided Image Editing via Sparse Latent Constraints

Mutiara Shabrina, Nova Kurnia Putri, Jefri Satria Ferdiansyah, Sabita Khansa Dewi, Novanto Yudistira

PDF

Open Access

TL;DR

This paper proposes a training-free, disentangled image editing method guided by text, using sparse latent constraints to improve control and reduce attribute entanglement in face editing tasks.

Contribution

It introduces a sparsity-based regularization to the PPE framework, enhancing disentanglement and control in text-guided image editing without additional training.

Findings

01

Reduces unintended attribute changes in face editing.

02

Enforces more focused and controlled image edits.

03

Preserves facial identity during attribute modifications.

Abstract

Text-driven image manipulation often suffers from attribute entanglement, where modifying a target attribute (e.g., adding bangs) unintentionally alters other semantic properties such as identity or appearance. The Predict, Prevent, and Evaluate (PPE) framework addresses this issue by leveraging pre-trained vision-language models for disentangled editing. In this work, we analyze the PPE framework, focusing on its architectural components, including BERT-based attribute prediction and StyleGAN2-based image generation on the CelebA-HQ dataset. Through empirical analysis, we identify a limitation in the original regularization strategy, where latent updates remain dense and prone to semantic leakage. To mitigate this issue, we introduce a sparsity-based constraint using L1 regularization on latent space manipulation. Experimental results demonstrate that the proposed approach enforces…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Face recognition and analysis · Digital Media Forensic Detection