IR-GAN: Image Manipulation with Linguistic Instruction by Increment Reasoning
Zhenhuan Liu, Jincan Deng, Liang Li, Shaofei Cai, Qianqian Xu, Shuhui, Wang, Qingming Huang

TL;DR
IR-GAN is a novel model that enhances image manipulation by reasoning the consistency between visual and semantic increments, improving the logical coherence of generated images based on linguistic instructions.
Contribution
The paper introduces IR-GAN, which incorporates semantic and visual increment reasoning with a new discriminator to improve multimodal conditional image generation.
Findings
IR-GAN effectively measures and enforces consistency between image and instruction increments.
Experimental results demonstrate IR-GAN's superior performance on two datasets.
Visualization confirms improved logical coherence in generated images.
Abstract
Conditional image generation is an active research topic including text2image and image translation. Recently image manipulation with linguistic instruction brings new challenges of multimodal conditional generation. However, traditional conditional image generation models mainly focus on generating high-quality and visually realistic images, and lack resolving the partial consistency between image and instruction. To address this issue, we propose an Increment Reasoning Generative Adversarial Network (IR-GAN), which aims to reason the consistency between visual increment in images and semantic increment in instructions. First, we introduce the word-level and instruction-level instruction encoders to learn user's intention from history-correlated instructions as semantic increment. Second, we embed the representation of semantic increment into that of source image for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
