Improving the Diversity of Unsupervised Paraphrasing with Embedding   Outputs

Monisha Jegadeesan; Sachin Kumar; John Wieting; Yulia Tsvetkov

arXiv:2110.13231·cs.CL·October 27, 2021

Improving the Diversity of Unsupervised Paraphrasing with Embedding Outputs

Monisha Jegadeesan, Sachin Kumar, John Wieting, Yulia Tsvetkov

PDF

Open Access 1 Repo

TL;DR

This paper introduces a multilingual, zero-shot paraphrasing model that leverages embedding outputs and an autoencoding training approach to enhance diversity and fluency in paraphrase generation across languages.

Contribution

It proposes a novel end-to-end model that replaces the softmax layer with word embeddings and uses translated corpora for training, enabling effective cross-lingual parameter sharing.

Findings

01

Outperforms zero-shot baselines on multiple languages

02

Achieves higher diversity and fluency in generated paraphrases

03

Validated through computational metrics and human evaluation

Abstract

We present a novel technique for zero-shot paraphrase generation. The key contribution is an end-to-end multilingual paraphrasing model that is trained using translated parallel corpora to generate paraphrases into "meaning spaces" -- replacing the final softmax layer with word embeddings. This architectural modification, plus a training procedure that incorporates an autoencoding objective, enables effective parameter sharing across languages for more fluent monolingual rewriting, and facilitates fluency and diversity in generation. Our continuous-output paraphrase generation models outperform zero-shot paraphrasing baselines when evaluated on two languages using a battery of computational metrics as well as in human assessment.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

monisha-jega/paraphrasing_embedding_outputs
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification

MethodsSoftmax