ChatPainter: Improving Text to Image Generation using Dialogue

Shikhar Sharma; Dendi Suhubdy; Vincent Michalski; Samira Ebrahimi; Kahou; Yoshua Bengio

arXiv:1802.08216·cs.CV·February 23, 2018·78 cites

ChatPainter: Improving Text to Image Generation using Dialogue

Shikhar Sharma, Dendi Suhubdy, Vincent Michalski, Samira Ebrahimi, Kahou, Yoshua Bengio

PDF

Open Access

TL;DR

ChatPainter introduces dialogue-based descriptions to enhance text-to-image generation, significantly improving image quality and understanding of complex scenes in datasets like MS COCO.

Contribution

This work demonstrates that incorporating dialogue descriptions into text prompts improves the fidelity and accuracy of generated images compared to using captions alone.

Findings

01

Increased inception scores for generated images.

02

Improved object recognition in synthesized images.

03

Enhanced scene understanding through dialogue augmentation.

Abstract

Synthesizing realistic images from text descriptions on a dataset like Microsoft Common Objects in Context (MS COCO), where each image can contain several objects, is a challenging task. Prior work has used text captions to generate images. However, captions might not be informative enough to capture the entire image and insufficient for the model to be able to understand which objects in the images correspond to which words in the captions. We show that adding a dialogue that further describes the scene leads to significant improvement in the inception score and in the quality of generated images on the MS COCO dataset.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Advanced Image and Video Retrieval Techniques