Evaluating Paraphrastic Robustness in Textual Entailment Models

Dhruv Verma; Yash Kumar Lal; Shreyashee Sinha; Benjamin Van Durme,; Adam Poliak

arXiv:2306.16722·cs.CL·June 30, 2023

Evaluating Paraphrastic Robustness in Textual Entailment Models

Dhruv Verma, Yash Kumar Lal, Shreyashee Sinha, Benjamin Van Durme,, Adam Poliak

PDF

Open Access

TL;DR

This paper introduces PaRTE, a dataset of 1,126 RTE pairs, to assess whether models maintain consistent predictions when inputs are paraphrased, revealing current models' limited robustness.

Contribution

The paper provides a new dataset and evaluation methodology to measure paraphrastic robustness in textual entailment models.

Findings

01

Models change predictions on 8-16% of paraphrased examples.

02

Contemporary RTE models lack full robustness to paraphrasing.

03

Room for improvement in model understanding of language.

Abstract

We present PaRTE, a collection of 1,126 pairs of Recognizing Textual Entailment (RTE) examples to evaluate whether models are robust to paraphrasing. We posit that if RTE models understand language, their predictions should be consistent across inputs that share the same meaning. We use the evaluation set to determine if RTE models' predictions change when examples are paraphrased. In our experiments, contemporary models change their predictions on 8-16\% of paraphrased examples, indicating that there is still room for improvement.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification