Image Generation from Image Captioning -- Invertible Approach

Nandakishore S Menon; Chandramouli Kamanchi; Raghuram Bharadwaj; Diddigi

arXiv:2410.20171·cs.CV·October 29, 2024

Image Generation from Image Captioning -- Invertible Approach

Nandakishore S Menon, Chandramouli Kamanchi, Raghuram Bharadwaj, Diddigi

PDF

Open Access

TL;DR

This paper introduces an invertible neural network model capable of performing both image captioning and image generation tasks using a single training process, enabling bidirectional image-text mapping without extra training.

Contribution

It presents a novel invertible neural network architecture that learns a one-to-one mapping between images and text, allowing dual tasks with only one task's training.

Findings

01

Successful training of an invertible model for image captioning

02

The model can generate images from text by inversion

03

No additional training needed for image generation

Abstract

Our work aims to build a model that performs dual tasks of image captioning and image generation while being trained on only one task. The central idea is to train an invertible model that learns a one-to-one mapping between the image and text embeddings. Once the invertible model is efficiently trained on one task, the image captioning, the same model can generate new images for a given text through the inversion process, with no additional training. This paper proposes a simple invertible neural network architecture for this problem and presents our current findings.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Video Analysis and Summarization