Variational Autoencoder for Deep Learning of Images, Labels and Captions

Yunchen Pu; Zhe Gan; Ricardo Henao; Xin Yuan; Chunyuan Li; Andrew; Stevens; Lawrence Carin

arXiv:1609.08976·stat.ML·September 29, 2016·373 cites

Variational Autoencoder for Deep Learning of Images, Labels and Captions

Yunchen Pu, Zhe Gan, Ricardo Henao, Xin Yuan, Chunyuan Li, Andrew, Stevens, Lawrence Carin

PDF

Open Access

TL;DR

This paper introduces a variational autoencoder framework that jointly models images, labels, and captions, enabling semi-supervised and unsupervised learning with efficient inference.

Contribution

It combines a deep generative deconvolutional network with a CNN encoder and integrates label and caption modeling, advancing semi-supervised and unsupervised image learning.

Findings

01

Efficient averaging over latent codes improves prediction.

02

Framework supports semi-supervised learning with labels.

03

Allows unsupervised CNN training on images alone.

Abstract

A novel variational autoencoder is developed to model images, as well as associated labels or captions. The Deep Generative Deconvolutional Network (DGDN) is used as a decoder of the latent image features, and a deep Convolutional Neural Network (CNN) is used as an image encoder; the CNN is used to approximate a distribution for the latent DGDN features/code. The latent code is also linked to generative models for labels (Bayesian support vector machine) or captions (recurrent neural network). When predicting a label/caption for a new image at test, averaging is performed across the distribution of latent codes; this is computationally efficient as a consequence of the learned CNN-based encoder. Since the framework is capable of modeling the image in the presence/absence of associated labels/captions, a new semi-supervised setting is manifested for CNN learning with images; the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning

MethodsSolana Customer Service Number +1-833-534-1729