Towards Arbitrary Text-driven Image Manipulation via Space Alignment

Yunpeng Bai; Zihan Zhong; Chao Dong; Weichen Zhang; Guowei Xu; Chun; Yuan

arXiv:2301.10670·cs.CV·September 22, 2023

Towards Arbitrary Text-driven Image Manipulation via Space Alignment

Yunpeng Bai, Zihan Zhong, Chao Dong, Weichen Zhang, Guowei Xu, Chun, Yuan

PDF

Open Access

TL;DR

This paper introduces TMSA, a novel framework that enables real-time, arbitrary text-driven image editing by aligning semantic spaces of CLIP and StyleGAN, eliminating the need for costly optimization.

Contribution

The proposed Space Alignment module allows direct semantic manipulation in StyleGAN space based on text, improving efficiency and flexibility over previous methods.

Findings

01

Supports real-time, arbitrary text-driven image editing

02

Achieves superior performance compared to prior methods

03

Eliminates additional optimization costs

Abstract

The recent GAN inversion methods have been able to successfully invert the real image input to the corresponding editable latent code in StyleGAN. By combining with the language-vision model (CLIP), some text-driven image manipulation methods are proposed. However, these methods require extra costs to perform optimization for a certain image or a new attribute editing mode. To achieve a more efficient editing method, we propose a new Text-driven image Manipulation framework via Space Alignment (TMSA). The Space Alignment module aims to align the same semantic regions in CLIP and StyleGAN spaces. Then, the text input can be directly accessed into the StyleGAN space and be used to find the semantic shift according to the text description. The framework can support arbitrary image editing mode without additional cost. Our work provides the user with an interface to control the attributes…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Digital Media Forensic Detection · Handwritten Text Recognition Techniques

MethodsStyleGAN · Dense Connections · Adaptive Instance Normalization · Convolution · Feedforward Network · R1 Regularization · HuMan(Expedia)||How do I get a human at Expedia? · Contrastive Language-Image Pre-training · ALIGN