Evaluating Paraphrastic Robustness in Textual Entailment Models
Dhruv Verma, Yash Kumar Lal, Shreyashee Sinha, Benjamin Van Durme,, Adam Poliak

TL;DR
This paper introduces PaRTE, a dataset of 1,126 RTE pairs, to assess whether models maintain consistent predictions when inputs are paraphrased, revealing current models' limited robustness.
Contribution
The paper provides a new dataset and evaluation methodology to measure paraphrastic robustness in textual entailment models.
Findings
Models change predictions on 8-16% of paraphrased examples.
Contemporary RTE models lack full robustness to paraphrasing.
Room for improvement in model understanding of language.
Abstract
We present PaRTE, a collection of 1,126 pairs of Recognizing Textual Entailment (RTE) examples to evaluate whether models are robust to paraphrasing. We posit that if RTE models understand language, their predictions should be consistent across inputs that share the same meaning. We use the evaluation set to determine if RTE models' predictions change when examples are paraphrased. In our experiments, contemporary models change their predictions on 8-16\% of paraphrased examples, indicating that there is still room for improvement.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification
