CLIP-PAE: Projection-Augmentation Embedding to Extract Relevant Features for a Disentangled, Interpretable, and Controllable Text-Guided Face Manipulation

Chenliang Zhou; Fangcheng Zhong; Cengiz Oztireli

arXiv:2210.03919·cs.CV·September 3, 2025·1 cites

CLIP-PAE: Projection-Augmentation Embedding to Extract Relevant Features for a Disentangled, Interpretable, and Controllable Text-Guided Face Manipulation

Chenliang Zhou, Fangcheng Zhong, Cengiz Oztireli

PDF

Open Access

TL;DR

CLIP-PAE introduces a projection-augmentation embedding technique that enhances text-guided face manipulation by improving disentanglement, interpretability, and control, while maintaining high image quality.

Contribution

The paper proposes CLIP-PAE, a novel optimization target that leverages corpus subspaces to improve image manipulation in CLIP-based models, addressing artifacts and control issues.

Findings

01

PAE improves disentanglement and interpretability in face editing.

02

The method achieves state-of-the-art quality and accuracy.

03

It is easily adaptable to existing CLIP-based algorithms.

Abstract

Recently introduced Contrastive Language-Image Pre-Training (CLIP) bridges images and text by embedding them into a joint latent space. This opens the door to ample literature that aims to manipulate an input image by providing a textual explanation. However, due to the discrepancy between image and text embeddings in the joint space, using text embeddings as the optimization target often introduces undesired artifacts in the resulting images. Disentanglement, interpretability, and controllability are also hard to guarantee for manipulation. To alleviate these problems, we propose to define corpus subspaces spanned by relevant prompts to capture specific image characteristics. We introduce CLIP Projection-Augmentation Embedding (PAE) as an optimization target to improve the performance of text-guided image manipulation. Our method is a simple and general paradigm that can be easily…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFacial Nerve Paralysis Treatment and Research · Face recognition and analysis · Herpesvirus Infections and Treatments

MethodsContrastive Language-Image Pre-training