Improved Variational Autoencoders for Text Modeling using Dilated   Convolutions

Zichao Yang; Zhiting Hu; Ruslan Salakhutdinov; Taylor Berg-Kirkpatrick

arXiv:1702.08139·cs.NE·June 20, 2017·95 cites

Improved Variational Autoencoders for Text Modeling using Dilated Convolutions

Zichao Yang, Zhiting Hu, Ruslan Salakhutdinov, Taylor Berg-Kirkpatrick

PDF

Open Access 3 Repos

TL;DR

This paper introduces dilated CNN decoders for variational autoencoders in text modeling, overcoming previous limitations and achieving better perplexity and labeling performance than LSTM-based VAEs.

Contribution

It proposes a novel dilated CNN decoder architecture for VAEs, enabling effective text generation and semi-supervised learning, with empirical performance improvements.

Findings

01

Dilated CNN decoders improve VAE text modeling performance.

02

Proper decoder architecture balances context and encoding information.

03

VAE with dilated CNN outperforms LSTM-based VAEs on perplexity and labeling tasks.

Abstract

Recent work on generative modeling of text has found that variational auto-encoders (VAE) incorporating LSTM decoders perform worse than simpler LSTM language models (Bowman et al., 2015). This negative result is so far poorly understood, but has been attributed to the propensity of LSTM decoders to ignore conditioning information from the encoder. In this paper, we experiment with a new type of decoder for VAE: a dilated CNN. By changing the decoder's dilation architecture, we control the effective context from previously generated words. In experiments, we find that there is a trade off between the contextual capacity of the decoder and the amount of encoding information used. We show that with the right decoder, VAE can outperform LSTM language models. We demonstrate perplexity gains on two datasets, representing the first positive experimental result on the use VAE for generative…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Generative Adversarial Networks and Image Synthesis · Speech Recognition and Synthesis

MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory · USD Coin Customer Service Number +1-833-534-1729