Image Generation from Image Captioning -- Invertible Approach
Nandakishore S Menon, Chandramouli Kamanchi, Raghuram Bharadwaj, Diddigi

TL;DR
This paper introduces an invertible neural network model capable of performing both image captioning and image generation tasks using a single training process, enabling bidirectional image-text mapping without extra training.
Contribution
It presents a novel invertible neural network architecture that learns a one-to-one mapping between images and text, allowing dual tasks with only one task's training.
Findings
Successful training of an invertible model for image captioning
The model can generate images from text by inversion
No additional training needed for image generation
Abstract
Our work aims to build a model that performs dual tasks of image captioning and image generation while being trained on only one task. The central idea is to train an invertible model that learns a one-to-one mapping between the image and text embeddings. Once the invertible model is efficiently trained on one task, the image captioning, the same model can generate new images for a given text through the inversion process, with no additional training. This paper proposes a simple invertible neural network architecture for this problem and presents our current findings.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Video Analysis and Summarization
