Image Manipulation with Natural Language using Two-sidedAttentive Conditional Generative Adversarial Network
Dawei Zhu, Aditya Mogadala, Dietrich Klakow

TL;DR
This paper introduces TEA-cGAN, a novel natural language-driven image manipulation model that uses two-sided attention to generate high-resolution, semantically altered images while preserving background details, outperforming existing methods.
Contribution
The paper proposes TEA-cGAN, a two-sided attentive conditional GAN that enables precise image editing using natural language, with improved quality and content preservation.
Findings
TEA-cGAN outperforms existing methods on CUB and Oxford-102 datasets.
It generates high-resolution images at 128x128 and 256x256.
The model effectively preserves background while altering specified objects.
Abstract
Altering the content of an image with photo editing tools is a tedious task for an inexperienced user. Especially, when modifying the visual attributes of a specific object in an image without affecting other constituents such as background etc. To simplify the process of image manipulation and to provide more control to users, it is better to utilize a simpler interface like natural language. Therefore, in this paper, we address the challenge of manipulating images using natural language description. We propose the Two-sidEd Attentive conditional Generative Adversarial Network (TEA-cGAN) to generate semantically manipulated images while preserving other contents such as background intact. TEA-cGAN uses fine-grained attention both in the generator and discriminator of Generative Adversarial Network (GAN) based framework at different scales. Experimental results show that TEA-cGAN which…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Image Processing Techniques · Digital Media Forensic Detection
