Text-to-Image-to-Text Translation using Cycle Consistent Adversarial Networks
Satya Krishna Gorti, Jeremy Ma

TL;DR
This paper introduces a cycle consistent adversarial network that improves text-to-image translation by ensuring generated images accurately reflect input sentences through captioning and caption comparison.
Contribution
It proposes a novel cycle consistency approach that incorporates captioning to enhance the semantic accuracy of generated images from text descriptions.
Findings
Improved alignment between generated images and input text
Enhanced image quality with better semantic fidelity
Extensive comparison shows superiority over existing methods
Abstract
Text-to-Image translation has been an active area of research in the recent past. The ability for a network to learn the meaning of a sentence and generate an accurate image that depicts the sentence shows ability of the model to think more like humans. Popular methods on text to image translation make use of Generative Adversarial Networks (GANs) to generate high quality images based on text input, but the generated images don't always reflect the meaning of the sentence given to the model as input. We address this issue by using a captioning network to caption on generated images and exploit the distance between ground truth captions and generated captions to improve the network further. We show extensive comparisons between our method and existing methods.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Video Analysis and Summarization
