IR-GAN: Image Manipulation with Linguistic Instruction by Increment   Reasoning

Zhenhuan Liu; Jincan Deng; Liang Li; Shaofei Cai; Qianqian Xu; Shuhui; Wang; Qingming Huang

arXiv:2204.00792·cs.CV·April 5, 2022

IR-GAN: Image Manipulation with Linguistic Instruction by Increment Reasoning

Zhenhuan Liu, Jincan Deng, Liang Li, Shaofei Cai, Qianqian Xu, Shuhui, Wang, Qingming Huang

PDF

TL;DR

IR-GAN is a novel model that enhances image manipulation by reasoning the consistency between visual and semantic increments, improving the logical coherence of generated images based on linguistic instructions.

Contribution

The paper introduces IR-GAN, which incorporates semantic and visual increment reasoning with a new discriminator to improve multimodal conditional image generation.

Findings

01

IR-GAN effectively measures and enforces consistency between image and instruction increments.

02

Experimental results demonstrate IR-GAN's superior performance on two datasets.

03

Visualization confirms improved logical coherence in generated images.

Abstract

Conditional image generation is an active research topic including text2image and image translation. Recently image manipulation with linguistic instruction brings new challenges of multimodal conditional generation. However, traditional conditional image generation models mainly focus on generating high-quality and visually realistic images, and lack resolving the partial consistency between image and instruction. To address this issue, we propose an Increment Reasoning Generative Adversarial Network (IR-GAN), which aims to reason the consistency between visual increment in images and semantic increment in instructions. First, we introduce the word-level and instruction-level instruction encoders to learn user's intention from history-correlated instructions as semantic increment. Second, we embed the representation of semantic increment into that of source image for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.