LatteGAN: Visually Guided Language Attention for Multi-Turn   Text-Conditioned Image Manipulation

Shoya Matsumori; Yuki Abe; Kosuke Shingyouchi; Komei Sugiura; and; Michita Imai

arXiv:2112.13985·cs.CV·June 3, 2022

LatteGAN: Visually Guided Language Attention for Multi-Turn Text-Conditioned Image Manipulation

Shoya Matsumori, Yuki Abe, Kosuke Shingyouchi, Komei Sugiura, and, Michita Imai

PDF

1 Repo

TL;DR

LatteGAN is a novel model that improves multi-turn text-guided image manipulation by using a visually guided language attention module and a text-conditioned discriminator, achieving state-of-the-art results.

Contribution

The paper introduces LatteGAN, a new architecture with a visually guided language attention module and a text-conditioned discriminator for enhanced multi-turn image manipulation.

Findings

01

Achieves state-of-the-art performance on CoDraw and i-CLEVR datasets.

02

Addresses under-generation and quality issues in multi-turn image manipulation.

03

Demonstrates significant improvement over previous models.

Abstract

Text-guided image manipulation tasks have recently gained attention in the vision-and-language community. While most of the prior studies focused on single-turn manipulation, our goal in this paper is to address the more challenging multi-turn image manipulation (MTIM) task. Previous models for this task successfully generate images iteratively, given a sequence of instructions and a previously generated image. However, this approach suffers from under-generation and a lack of generated quality of the objects that are described in the instructions, which consequently degrades the overall performance. To overcome these problems, we present a novel architecture called a Visually Guided Language Attention GAN (LatteGAN). Here, we address the limitations of the previous approaches by introducing a Visually Guided Language Attention (Latte) module, which extracts fine-grained text…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

smatsumori/lattegan
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsMax Pooling · Concatenated Skip Connection · *Communicated@Fast*How Do I Communicate to Expedia? · Convolution · U-Net