CLIP-PAE: Projection-Augmentation Embedding to Extract Relevant Features for a Disentangled, Interpretable, and Controllable Text-Guided Face Manipulation
Chenliang Zhou, Fangcheng Zhong, Cengiz Oztireli

TL;DR
CLIP-PAE introduces a projection-augmentation embedding technique that enhances text-guided face manipulation by improving disentanglement, interpretability, and control, while maintaining high image quality.
Contribution
The paper proposes CLIP-PAE, a novel optimization target that leverages corpus subspaces to improve image manipulation in CLIP-based models, addressing artifacts and control issues.
Findings
PAE improves disentanglement and interpretability in face editing.
The method achieves state-of-the-art quality and accuracy.
It is easily adaptable to existing CLIP-based algorithms.
Abstract
Recently introduced Contrastive Language-Image Pre-Training (CLIP) bridges images and text by embedding them into a joint latent space. This opens the door to ample literature that aims to manipulate an input image by providing a textual explanation. However, due to the discrepancy between image and text embeddings in the joint space, using text embeddings as the optimization target often introduces undesired artifacts in the resulting images. Disentanglement, interpretability, and controllability are also hard to guarantee for manipulation. To alleviate these problems, we propose to define corpus subspaces spanned by relevant prompts to capture specific image characteristics. We introduce CLIP Projection-Augmentation Embedding (PAE) as an optimization target to improve the performance of text-guided image manipulation. Our method is a simple and general paradigm that can be easily…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFacial Nerve Paralysis Treatment and Research · Face recognition and analysis · Herpesvirus Infections and Treatments
MethodsContrastive Language-Image Pre-training
