Variational Attention for Sequence-to-Sequence Models
Hareesh Bahuleyan, Lili Mou, Olga Vechtomova, Pascal Poupart

TL;DR
This paper introduces a variational attention mechanism for sequence-to-sequence models that models attention as Gaussian random variables, enhancing diversity without sacrificing quality.
Contribution
It proposes a novel variational attention mechanism that prevents bypassing in variational encoder-decoder models, improving diversity in generated sequences.
Findings
Increases diversity of generated sentences.
Alleviates bypassing phenomenon in variational models.
Abstract
The variational encoder-decoder (VED) encodes source information as a set of random variables using a neural network, which in turn is decoded into target data using another neural network. In natural language processing, sequence-to-sequence (Seq2Seq) models typically serve as encoder-decoder networks. When combined with a traditional (deterministic) attention mechanism, the variational latent space may be bypassed by the attention model, and thus becomes ineffective. In this paper, we propose a variational attention mechanism for VED, where the attention vector is also modeled as Gaussian distributed random variables. Results on two experiments show that, without loss of quality, our proposed method alleviates the bypassing phenomenon as it increases the diversity of generated sentences.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis
