Paraphrase Generation with Latent Bag of Words
Yao Fu, Yansong Feng, John P. Cunningham

TL;DR
This paper introduces a novel latent bag of words model for paraphrase generation that improves interpretability and effectiveness by using discrete latent variables grounded in target sentence semantics.
Contribution
It proposes a fully differentiable content planning model using a latent BOW, enhancing interpretability and performance in paraphrase generation.
Findings
Model achieves transparent and effective paraphrase generation.
Unsupervised learning of word neighbors improves semantic coherence.
Differentiable subset sampling enhances generation control.
Abstract
Paraphrase generation is a longstanding important problem in natural language processing. In addition, recent progress in deep generative models has shown promising results on discrete latent variables for text generation. Inspired by variational autoencoders with discrete latent structures, in this work, we propose a latent bag of words (BOW) model for paraphrase generation. We ground the semantics of a discrete latent variable by the BOW from the target sentences. We use this latent variable to build a fully differentiable content planning and surface realization model. Specifically, we use source words to predict their neighbors and model the target BOW with a mixture of softmax. We use Gumbel top-k reparameterization to perform differentiable subset sampling from the predicted BOW distribution. We retrieve the sampled word embeddings and use them to augment the decoder…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
MethodsInterpretability
