Improving the Natural Language Inference robustness to hard dataset by   data augmentation and preprocessing

Zijiang Yang

arXiv:2412.07108·cs.CL·December 11, 2024

Improving the Natural Language Inference robustness to hard dataset by data augmentation and preprocessing

Zijiang Yang

PDF

Open Access

TL;DR

This paper proposes data augmentation and preprocessing techniques to enhance the robustness of Natural Language Inference models against hard, out-of-distribution datasets, addressing issues like word overlap and numerical reasoning.

Contribution

It introduces general data augmentation and preprocessing methods that improve NLI model robustness without relying on test data distribution.

Findings

01

Enhanced model performance on hard datasets

02

Reduced reliance on spurious correlations

03

Improved handling of numerical reasoning and length mismatch

Abstract

Natural Language Inference (NLI) is the task of inferring whether the hypothesis can be justified by the given premise. Basically, we classify the hypothesis into three labels(entailment, neutrality and contradiction) given the premise. NLI was well studied by the previous researchers. A number of models, especially the transformer based ones, have achieved significant improvement on these tasks. However, it is reported that these models are suffering when they are dealing with hard datasets. Particularly, they perform much worse when dealing with unseen out-of-distribution premise and hypothesis. They may not understand the semantic content but learn the spurious correlations. In this work, we propose the data augmentation and preprocessing methods to solve the word overlap, numerical reasoning and length mismatch problems. These methods are general methods that do not rely on the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Data Quality and Management