Towards Counterfactual Image Manipulation via CLIP

Yingchen Yu; Fangneng Zhan; Rongliang Wu; Jiahui Zhang; Shijian Lu,; Miaomiao Cui; Xuansong Xie; Xian-Sheng Hua; Chunyan Miao

arXiv:2207.02812·cs.CV·July 13, 2022

Towards Counterfactual Image Manipulation via CLIP

Yingchen Yu, Fangneng Zhan, Rongliang Wu, Jiahui Zhang, Shijian Lu,, Miaomiao Cui, Xuansong Xie, Xian-Sheng Hua, Chunyan Miao

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel text-driven counterfactual image editing method using CLIP, enabling realistic and accurate modifications of images based on counterfactual concepts by exploiting semantic directions and embedding mapping.

Contribution

The work proposes a new contrastive loss and embedding mapping scheme to improve counterfactual image editing with CLIP, addressing challenges of semantic guidance and editing precision.

Findings

01

Achieves accurate counterfactual image editing driven by target texts.

02

Produces realistic edits that align with desired counterfactual concepts.

03

Demonstrates effectiveness across diverse semantic editing scenarios.

Abstract

Leveraging StyleGAN's expressivity and its disentangled latent codes, existing methods can achieve realistic editing of different visual attributes such as age and gender of facial images. An intriguing yet challenging problem arises: Can generative models achieve counterfactual editing against their learnt priors? Due to the lack of counterfactual samples in natural datasets, we investigate this problem in a text-driven manner with Contrastive-Language-Image-Pretraining (CLIP), which can offer rich semantic knowledge even for various counterfactual concepts. Different from in-domain manipulation, counterfactual manipulation requires more comprehensive exploitation of semantic knowledge encapsulated in CLIP as well as more delicate handling of editing directions for avoiding being stuck in local minimum or undesired editing. To this end, we design a novel contrastive loss that exploits…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yingchen001/cf-clip
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Video Analysis and Summarization

MethodsContrastive Language-Image Pre-training