Towards Arbitrary Text-driven Image Manipulation via Space Alignment
Yunpeng Bai, Zihan Zhong, Chao Dong, Weichen Zhang, Guowei Xu, Chun, Yuan

TL;DR
This paper introduces TMSA, a novel framework that enables real-time, arbitrary text-driven image editing by aligning semantic spaces of CLIP and StyleGAN, eliminating the need for costly optimization.
Contribution
The proposed Space Alignment module allows direct semantic manipulation in StyleGAN space based on text, improving efficiency and flexibility over previous methods.
Findings
Supports real-time, arbitrary text-driven image editing
Achieves superior performance compared to prior methods
Eliminates additional optimization costs
Abstract
The recent GAN inversion methods have been able to successfully invert the real image input to the corresponding editable latent code in StyleGAN. By combining with the language-vision model (CLIP), some text-driven image manipulation methods are proposed. However, these methods require extra costs to perform optimization for a certain image or a new attribute editing mode. To achieve a more efficient editing method, we propose a new Text-driven image Manipulation framework via Space Alignment (TMSA). The Space Alignment module aims to align the same semantic regions in CLIP and StyleGAN spaces. Then, the text input can be directly accessed into the StyleGAN space and be used to find the semantic shift according to the text description. The framework can support arbitrary image editing mode without additional cost. Our work provides the user with an interface to control the attributes…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Digital Media Forensic Detection · Handwritten Text Recognition Techniques
MethodsStyleGAN · Dense Connections · Adaptive Instance Normalization · Convolution · Feedforward Network · R1 Regularization · HuMan(Expedia)||How do I get a human at Expedia? · Contrastive Language-Image Pre-training · ALIGN
