# ParaBank: Monolingual Bitext Generation and Sentential Paraphrasing via   Lexically-constrained Neural Machine Translation

**Authors:** J. Edward Hu, Rachel Rudinger, Matt Post, Benjamin Van Durme

arXiv: 1901.03644 · 2019-01-14

## TL;DR

ParaBank is a large-scale, high-quality English paraphrase dataset created using lexically-constrained neural machine translation, enabling improved paraphrasing and sentence rewriting tasks.

## Contribution

We introduce ParaBank, a novel large-scale paraphrase dataset generated with lexically-constrained NMT, surpassing prior datasets in size and quality.

## Key findings

- ParaBank contains over 4 billion tokens with high lexical diversity.
- Paraphrases from ParaBank outperform ParaNMT in semantic similarity and fluency.
- The dataset enables training of monolingual NMT models for sentence rewriting.

## Abstract

We present ParaBank, a large-scale English paraphrase dataset that surpasses prior work in both quantity and quality. Following the approach of ParaNMT, we train a Czech-English neural machine translation (NMT) system to generate novel paraphrases of English reference sentences. By adding lexical constraints to the NMT decoding procedure, however, we are able to produce multiple high-quality sentential paraphrases per source sentence, yielding an English paraphrase resource with more than 4 billion generated tokens and exhibiting greater lexical diversity. Using human judgments, we also demonstrate that ParaBank's paraphrases improve over ParaNMT on both semantic similarity and fluency. Finally, we use ParaBank to train a monolingual NMT model with the same support for lexically-constrained decoding for sentence rewriting tasks.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1901.03644/full.md

## Figures

1 figure with captions in the complete paper: https://tomesphere.com/paper/1901.03644/full.md

## References

28 references — full list in the complete paper: https://tomesphere.com/paper/1901.03644/full.md

---
Source: https://tomesphere.com/paper/1901.03644