Recurrence Boosts Diversity! Revisiting Recurrent Latent Variable in   Transformer-Based Variational AutoEncoder for Diverse Text Generation

Jinyi Hu; Xiaoyuan Yi; Wenhao Li; Maosong Sun; Xing Xie

arXiv:2210.12409·cs.CL·November 24, 2022·1 cites

Recurrence Boosts Diversity! Revisiting Recurrent Latent Variable in Transformer-Based Variational AutoEncoder for Diverse Text Generation

Jinyi Hu, Xiaoyuan Yi, Wenhao Li, Maosong Sun, Xing Xie

PDF

Open Access

TL;DR

This paper introduces TRACE, a Transformer-based recurrent VAE that improves text generation diversity by incorporating recurrent dynamics into segment-wise latent variables, with theoretical guarantees and efficient parallel computation.

Contribution

It proposes a novel recurrent VAE structure for Transformers, enabling better diversity and theoretical diversity guarantees in text generation.

Findings

01

Enhanced diversity in generated text

02

Maintained high generation quality

03

Theoretical lower bound on KL divergence

Abstract

Variational Auto-Encoder (VAE) has been widely adopted in text generation. Among many variants, recurrent VAE learns token-wise latent variables with each conditioned on the preceding ones, which captures sequential variability better in the era of RNN. However, it is unclear how to incorporate such recurrent dynamics into the recently dominant Transformer due to its parallelism. In this work, we propose TRACE, a Transformer-based recurrent VAE structure. TRACE imposes recurrence on segment-wise latent variables with arbitrarily separated text segments and constructs the posterior distribution with residual parameterization. Besides, we design an acceleration method by approximating idempotent matrices, which allows parallelism while maintaining the conditional dependence of latent variables. We demonstrate that TRACE could enhance the entanglement of each segment and preceding latent…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Computational and Text Analysis Methods

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Dense Connections · Softmax · Position-Wise Feed-Forward Layer · Adam · Label Smoothing · Absolute Position Encodings · Layer Normalization