# Variational Hetero-Encoder Randomized GANs for Joint Image-Text Modeling

**Authors:** Hao Zhang, Bo Chen, Long Tian, Zhengjue Wang, Mingyuan Zhou

arXiv: 1905.08622 · 2020-01-09

## TL;DR

This paper introduces VHE-GAN, a novel deep generative model that unifies probabilistic text decoding, image encoding, and GANs for bidirectional joint image-text modeling, achieving state-of-the-art results.

## Contribution

It develops a versatile end-to-end multi-modality framework combining variational hetero-encoders with GANs, enabling hierarchical and semantic image-text generation.

## Key findings

- Achieves competitive performance with existing modules.
- Develops VHE-raster-scan-GAN for hierarchical image generation.
- Attains state-of-the-art results in multi-modality tasks.

## Abstract

For bidirectional joint image-text modeling, we develop variational hetero-encoder (VHE) randomized generative adversarial network (GAN), a versatile deep generative model that integrates a probabilistic text decoder, probabilistic image encoder, and GAN into a coherent end-to-end multi-modality learning framework. VHE randomized GAN (VHE-GAN) encodes an image to decode its associated text, and feeds the variational posterior as the source of randomness into the GAN image generator. We plug three off-the-shelf modules, including a deep topic model, a ladder-structured image encoder, and StackGAN++, into VHE-GAN, which already achieves competitive performance. This further motivates the development of VHE-raster-scan-GAN that generates photo-realistic images in not only a multi-scale low-to-high-resolution manner, but also a hierarchical-semantic coarse-to-fine fashion. By capturing and relating hierarchical semantic and visual concepts with end-to-end training, VHE-raster-scan-GAN achieves state-of-the-art performance in a wide variety of image-text multi-modality learning and generation tasks.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1905.08622/full.md

## Figures

47 figures with captions in the complete paper: https://tomesphere.com/paper/1905.08622/full.md

## References

63 references — full list in the complete paper: https://tomesphere.com/paper/1905.08622/full.md

---
Source: https://tomesphere.com/paper/1905.08622