Generalization in NLI: Ways (Not) To Go Beyond Simple Heuristics

Prajjwal Bhargava; Aleksandr Drozd; Anna Rogers

arXiv:2110.01518·cs.CL·October 5, 2021

Generalization in NLI: Ways (Not) To Go Beyond Simple Heuristics

Prajjwal Bhargava, Aleksandr Drozd, Anna Rogers

PDF

Open Access 1 Repo 10 Models

TL;DR

This paper investigates how BERT-based models generalize in natural language inference, analyzing various strategies and revealing insights into the limitations and potential of these models beyond simple heuristics.

Contribution

It provides a comprehensive case study of generalization strategies in NLI, comparing multiple architectures and techniques to understand their effectiveness.

Findings

01

Some strategies improve generalization to adversarial datasets

02

Certain approaches fail to enhance robustness

03

Insights into how Transformer models learn to generalize

Abstract

Much of recent progress in NLU was shown to be due to models' learning dataset-specific heuristics. We conduct a case study of generalization in NLI (from MNLI to the adversarially constructed HANS dataset) in a range of BERT-based architectures (adapters, Siamese Transformers, HEX debiasing), as well as with subsampling the data and increasing the model size. We report 2 successful and 3 unsuccessful strategies, all providing insights into how Transformer-based models learn to generalize.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

prajjwal1/generalize_lm_nli
pytorchOfficial

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Adversarial Robustness in Machine Learning · Natural Language Processing Techniques