Generalization in NLI: Ways (Not) To Go Beyond Simple Heuristics
Prajjwal Bhargava, Aleksandr Drozd, Anna Rogers

TL;DR
This paper investigates how BERT-based models generalize in natural language inference, analyzing various strategies and revealing insights into the limitations and potential of these models beyond simple heuristics.
Contribution
It provides a comprehensive case study of generalization strategies in NLI, comparing multiple architectures and techniques to understand their effectiveness.
Findings
Some strategies improve generalization to adversarial datasets
Certain approaches fail to enhance robustness
Insights into how Transformer models learn to generalize
Abstract
Much of recent progress in NLU was shown to be due to models' learning dataset-specific heuristics. We conduct a case study of generalization in NLI (from MNLI to the adversarially constructed HANS dataset) in a range of BERT-based architectures (adapters, Siamese Transformers, HEX debiasing), as well as with subsampling the data and increasing the model size. We report 2 successful and 3 unsuccessful strategies, all providing insights into how Transformer-based models learn to generalize.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗prajjwal1/bert-tinymodel· 769k dl· ♡ 140769k dl♡ 140
- 🤗prajjwal1/albert-base-v1-mnlimodel· 14 dl14 dl
- 🤗prajjwal1/albert-base-v2-mnlimodel· 59 dl59 dl
- 🤗prajjwal1/bert-medium-mnlimodel· 1.9k dl· ♡ 11.9k dl♡ 1
- 🤗prajjwal1/bert-mediummodel· 7.7k dl· ♡ 57.7k dl♡ 5
- 🤗prajjwal1/bert-mini-mnlimodel· 7 dl7 dl
- 🤗prajjwal1/bert-minimodel· 134k dl· ♡ 23134k dl♡ 23
- 🤗prajjwal1/bert-small-mnlimodel· 42 dl42 dl
- 🤗prajjwal1/bert-smallmodel· 23k dl· ♡ 2723k dl♡ 27
- 🤗prajjwal1/bert-tiny-mnlimodel· 947 dl· ♡ 4947 dl♡ 4
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Adversarial Robustness in Machine Learning · Natural Language Processing Techniques
