Text-to-Image-to-Text Translation using Cycle Consistent Adversarial   Networks

Satya Krishna Gorti; Jeremy Ma

arXiv:1808.04538·cs.LG·August 15, 2018·23 cites

Text-to-Image-to-Text Translation using Cycle Consistent Adversarial Networks

Satya Krishna Gorti, Jeremy Ma

PDF

Open Access 2 Repos

TL;DR

This paper introduces a cycle consistent adversarial network that improves text-to-image translation by ensuring generated images accurately reflect input sentences through captioning and caption comparison.

Contribution

It proposes a novel cycle consistency approach that incorporates captioning to enhance the semantic accuracy of generated images from text descriptions.

Findings

01

Improved alignment between generated images and input text

02

Enhanced image quality with better semantic fidelity

03

Extensive comparison shows superiority over existing methods

Abstract

Text-to-Image translation has been an active area of research in the recent past. The ability for a network to learn the meaning of a sentence and generate an accurate image that depicts the sentence shows ability of the model to think more like humans. Popular methods on text to image translation make use of Generative Adversarial Networks (GANs) to generate high quality images based on text input, but the generated images don't always reflect the meaning of the sentence given to the model as input. We address this issue by using a captioning network to caption on generated images and exploit the distance between ground truth captions and generated captions to improve the network further. We show extensive comparisons between our method and existing methods.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Video Analysis and Summarization