Extracting and filtering paraphrases by bridging natural language   inference and paraphrasing

Matej Klemen; Marko Robnik-\v{S}ikonja

arXiv:2111.07119·cs.CL·November 16, 2021

Extracting and filtering paraphrases by bridging natural language inference and paraphrasing

Matej Klemen, Marko Robnik-\v{S}ikonja

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel method that leverages natural language inference to extract and refine paraphrasing datasets, improving quality and revealing noise in existing datasets using transformer models.

Contribution

It proposes a bidirectional entailment approach to extract paraphrases from NLI datasets and to clean existing paraphrasing datasets, demonstrating high-quality results.

Findings

01

High-quality paraphrasing datasets extracted

02

Significant noise detected in existing datasets

03

Transformer models effective in evaluation

Abstract

Paraphrasing is a useful natural language processing task that can contribute to more diverse generated or translated texts. Natural language inference (NLI) and paraphrasing share some similarities and can benefit from a joint approach. We propose a novel methodology for the extraction of paraphrasing datasets from NLI datasets and cleaning existing paraphrasing datasets. Our approach is based on bidirectional entailment; namely, if two sentences can be mutually entailed, they are paraphrases. We evaluate our approach using several large pretrained transformer language models in the monolingual and cross-lingual setting. The results show high quality of extracted paraphrasing datasets and surprisingly high noise levels in two existing paraphrasing datasets.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

matejklemen/paraphrase-nli
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications