Synthesizing Novel Pairs of Image and Text

Jason Xie; Tingwen Bao

arXiv:1712.06682·cs.CV·December 20, 2017

Synthesizing Novel Pairs of Image and Text

Jason Xie, Tingwen Bao

PDF

Open Access

TL;DR

This paper introduces methods for generating new, paired image and text data by leveraging GANs and sequence models, enabling cross-domain and cycle-consistent synthesis for improved multimodal data creation.

Contribution

The paper proposes novel strategies for synthesizing image-text pairs across multiple domains using GANs, sequence models, and cycle-consistency techniques.

Findings

01

Effective generation of novel image-caption pairs

02

Cross-domain synthesis capabilities demonstrated

03

Cycle-consistency enhances data quality

Abstract

Generating novel pairs of image and text is a problem that combines computer vision and natural language processing. In this paper, we present strategies for generating novel image and caption pairs based on existing captioning datasets. The model takes advantage of recent advances in generative adversarial networks and sequence-to-sequence modeling. We make generalizations to generate paired samples from multiple domains. Furthermore, we study cycles -- generating from image to text then back to image and vise versa, as well as its connection with autoencoders.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Video Analysis and Summarization