CIGLI: Conditional Image Generation from Language & Image

Xiaopeng Lu; Lynnette Ng; Jared Fernandez; Hao Zhu

arXiv:2108.08955·cs.CV·August 23, 2021

CIGLI: Conditional Image Generation from Language & Image

Xiaopeng Lu, Lynnette Ng, Jared Fernandez, Hao Zhu

PDF

Open Access 1 Repo

TL;DR

This paper introduces CIGLI, a new task for generating images from combined language descriptions and images, along with a dataset and a fusion model that outperform existing baselines.

Contribution

The paper presents a novel task, a dedicated dataset, and a fusion model for generating images from both text and image inputs, advancing multi-modal generation research.

Findings

01

The fusion model outperforms baseline methods in automatic and human evaluations.

02

A new dataset ensures descriptions contain combined image and text information.

03

The approach improves multi-modal image generation quality.

Abstract

Multi-modal generation has been widely explored in recent years. Current research directions involve generating text based on an image or vice versa. In this paper, we propose a new task called CIGLI: Conditional Image Generation from Language and Image. Instead of generating an image based on text as in text-image generation, this task requires the generation of an image from a textual description and an image prompt. We designed a new dataset to ensure that the text description describes information from both images, and that solely analyzing the description is insufficient to generate an image. We then propose a novel language-image fusion model which improves the performance over two established baseline methods, as evaluated by quantitative (automatic) and qualitative (human) evaluations. The code and dataset is available at https://github.com/vincentlux/CIGLI.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

vincentlux/cigli
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Natural Language Processing Techniques · Topic Modeling