Vector Learning for Cross Domain Representations
Shagan Sah, Chi Zhang, Thang Nguyen, Dheeraj Kumar Peri, Ameya, Shringi, Raymond Ptucha

TL;DR
This paper introduces a novel framework that uses caption-based vector representations to generate images conditioned on natural language, leveraging sequence-to-sequence models and synthetic paraphrases for improved semantic accuracy.
Contribution
It presents a new image generation approach that conditions on captions and utilizes synthetic paraphrases, reducing reliance on large GAN datasets and complex training.
Findings
Images generated from multiple captions better capture semantic meaning.
Synthetic caption paraphrases improve image generation quality.
The method leverages existing caption datasets for enhanced image synthesis.
Abstract
Recently, generative adversarial networks have gained a lot of popularity for image generation tasks. However, such models are associated with complex learning mechanisms and demand very large relevant datasets. This work borrows concepts from image and video captioning models to form an image generative framework. The model is trained in a similar fashion as recurrent captioning model and uses the learned weights for image generation. This is done in an inverse direction, where the input is a caption and the output is an image. The vector representation of the sentence and frames are extracted from an encoder-decoder model which is initially trained on similar sentence and image pairs. Our model conditions image generation on a natural language caption. We leverage a sequence-to-sequence model to generate synthetic captions that have the same meaning for having a robust image…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
