Synthesizing Novel Pairs of Image and Text
Jason Xie, Tingwen Bao

TL;DR
This paper introduces methods for generating new, paired image and text data by leveraging GANs and sequence models, enabling cross-domain and cycle-consistent synthesis for improved multimodal data creation.
Contribution
The paper proposes novel strategies for synthesizing image-text pairs across multiple domains using GANs, sequence models, and cycle-consistency techniques.
Findings
Effective generation of novel image-caption pairs
Cross-domain synthesis capabilities demonstrated
Cycle-consistency enhances data quality
Abstract
Generating novel pairs of image and text is a problem that combines computer vision and natural language processing. In this paper, we present strategies for generating novel image and caption pairs based on existing captioning datasets. The model takes advantage of recent advances in generative adversarial networks and sequence-to-sequence modeling. We make generalizations to generate paired samples from multiple domains. Furthermore, we study cycles -- generating from image to text then back to image and vise versa, as well as its connection with autoencoders.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Video Analysis and Summarization
