Towards Counterfactual Image Manipulation via CLIP
Yingchen Yu, Fangneng Zhan, Rongliang Wu, Jiahui Zhang, Shijian Lu,, Miaomiao Cui, Xuansong Xie, Xian-Sheng Hua, Chunyan Miao

TL;DR
This paper introduces a novel text-driven counterfactual image editing method using CLIP, enabling realistic and accurate modifications of images based on counterfactual concepts by exploiting semantic directions and embedding mapping.
Contribution
The work proposes a new contrastive loss and embedding mapping scheme to improve counterfactual image editing with CLIP, addressing challenges of semantic guidance and editing precision.
Findings
Achieves accurate counterfactual image editing driven by target texts.
Produces realistic edits that align with desired counterfactual concepts.
Demonstrates effectiveness across diverse semantic editing scenarios.
Abstract
Leveraging StyleGAN's expressivity and its disentangled latent codes, existing methods can achieve realistic editing of different visual attributes such as age and gender of facial images. An intriguing yet challenging problem arises: Can generative models achieve counterfactual editing against their learnt priors? Due to the lack of counterfactual samples in natural datasets, we investigate this problem in a text-driven manner with Contrastive-Language-Image-Pretraining (CLIP), which can offer rich semantic knowledge even for various counterfactual concepts. Different from in-domain manipulation, counterfactual manipulation requires more comprehensive exploitation of semantic knowledge encapsulated in CLIP as well as more delicate handling of editing directions for avoiding being stuck in local minimum or undesired editing. To this end, we design a novel contrastive loss that exploits…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Video Analysis and Summarization
MethodsContrastive Language-Image Pre-training
