Variational Attention for Sequence-to-Sequence Models

Hareesh Bahuleyan; Lili Mou; Olga Vechtomova; Pascal Poupart

arXiv:1712.08207·cs.CL·June 25, 2018·20 cites

Variational Attention for Sequence-to-Sequence Models

Hareesh Bahuleyan, Lili Mou, Olga Vechtomova, Pascal Poupart

PDF

Open Access 2 Repos

TL;DR

This paper introduces a variational attention mechanism for sequence-to-sequence models that models attention as Gaussian random variables, enhancing diversity without sacrificing quality.

Contribution

It proposes a novel variational attention mechanism that prevents bypassing in variational encoder-decoder models, improving diversity in generated sequences.

Findings

01

Increases diversity of generated sentences.

02

Alleviates bypassing phenomenon in variational models.

Abstract

The variational encoder-decoder (VED) encodes source information as a set of random variables using a neural network, which in turn is decoded into target data using another neural network. In natural language processing, sequence-to-sequence (Seq2Seq) models typically serve as encoder-decoder networks. When combined with a traditional (deterministic) attention mechanism, the variational latent space may be bypassed by the attention model, and thus becomes ineffective. In this paper, we propose a variational attention mechanism for VED, where the attention vector is also modeled as Gaussian distributed random variables. Results on two experiments show that, without loss of quality, our proposed method alleviates the bypassing phenomenon as it increases the diversity of generated sentences.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis