Self-Supervised Image-to-Text and Text-to-Image Synthesis

Anindya Sundar Das; Sriparna Saha

arXiv:2112.04928·cs.CV·December 10, 2021

Self-Supervised Image-to-Text and Text-to-Image Synthesis

Anindya Sundar Das, Sriparna Saha

PDF

1 Repo

TL;DR

This paper introduces a self-supervised deep learning method that learns cross-modal embeddings for image-to-text and text-to-image synthesis, reducing reliance on labeled data and improving generation quality.

Contribution

It proposes a novel self-supervised approach using dense embeddings and GANs to learn cross-modal representations for both image and text generation tasks.

Findings

01

Successfully generates textual descriptions from images.

02

Generates images from textual descriptions.

03

Learns meaningful cross-modal embeddings without supervised labels.

Abstract

A comprehensive understanding of vision and language and their interrelation are crucial to realize the underlying similarities and differences between these modalities and to learn more generalized, meaningful representations. In recent years, most of the works related to Text-to-Image synthesis and Image-to-Text generation, focused on supervised generative deep architectures to solve the problems, where very little interest was placed on learning the similarities between the embedding spaces across modalities. In this paper, we propose a novel self-supervised deep learning based approach towards learning the cross-modal embedding spaces; for both image to text and text to image generations. In our approach, we first obtain dense vector representations of images using StackGAN-based autoencoder model and also dense vector representations on sentence-level utilizing LSTM based…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

anindyasdas/selfsupervisedimagetext
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsTanh Activation · Sigmoid Activation · Long Short-Term Memory