Improving the Diversity of Unsupervised Paraphrasing with Embedding Outputs
Monisha Jegadeesan, Sachin Kumar, John Wieting, Yulia Tsvetkov

TL;DR
This paper introduces a multilingual, zero-shot paraphrasing model that leverages embedding outputs and an autoencoding training approach to enhance diversity and fluency in paraphrase generation across languages.
Contribution
It proposes a novel end-to-end model that replaces the softmax layer with word embeddings and uses translated corpora for training, enabling effective cross-lingual parameter sharing.
Findings
Outperforms zero-shot baselines on multiple languages
Achieves higher diversity and fluency in generated paraphrases
Validated through computational metrics and human evaluation
Abstract
We present a novel technique for zero-shot paraphrase generation. The key contribution is an end-to-end multilingual paraphrasing model that is trained using translated parallel corpora to generate paraphrases into "meaning spaces" -- replacing the final softmax layer with word embeddings. This architectural modification, plus a training procedure that incorporates an autoencoding objective, enables effective parameter sharing across languages for more fluent monolingual rewriting, and facilitates fluency and diversity in generation. Our continuous-output paraphrase generation models outperform zero-shot paraphrasing baselines when evaluated on two languages using a battery of computational metrics as well as in human assessment.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification
MethodsSoftmax
