Dual Adversarial Inference for Text-to-Image Synthesis

Qicheng Lao; Mohammad Havaei; Ahmad Pesaranghader; Francis Dutil; Lisa; Di Jorio; Thomas Fevens

arXiv:1908.05324·cs.CV·August 16, 2019·5 cites

Dual Adversarial Inference for Text-to-Image Synthesis

Qicheng Lao, Mohammad Havaei, Ahmad Pesaranghader, Francis Dutil, Lisa, Di Jorio, Thomas Fevens

PDF

Open Access

TL;DR

This paper introduces a dual adversarial inference framework for text-to-image synthesis that disentangles content and style in the latent space, leading to improved image quality and meaningful style representations.

Contribution

It proposes a novel dual adversarial inference method to learn disentangled content and style representations in an unsupervised manner for text-to-image synthesis.

Findings

01

Learned meaningful style representations not described in text

02

Enhanced image quality on Oxford-102, CUB, and COCO datasets

03

Unsupervised disentanglement of content and style variables

Abstract

Synthesizing images from a given text description involves engaging two types of information: the content, which includes information explicitly described in the text (e.g., color, composition, etc.), and the style, which is usually not well described in the text (e.g., location, quantity, size, etc.). However, in previous works, it is typically treated as a process of generating images only from the content, i.e., without considering learning meaningful style representations. In this paper, we aim to learn two variables that are disentangled in the latent space, representing content and style respectively. We achieve this by augmenting current text-to-image synthesis frameworks with a dual adversarial inference mechanism. Through extensive experiments, we show that our model learns, in an unsupervised manner, style representations corresponding to certain meaningful information present…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Video Analysis and Summarization · Multimodal Machine Learning Applications