Lexical Generalization Improves with Larger Models and Longer Training

Elron Bandel; Yoav Goldberg; Yanai Elazar

arXiv:2210.12673·cs.CL·October 26, 2022

Lexical Generalization Improves with Larger Models and Longer Training

Elron Bandel, Yoav Goldberg, Yanai Elazar

PDF

Open Access 1 Repo 1 Datasets

TL;DR

This paper demonstrates that larger models and longer training durations reduce reliance on superficial lexical overlap heuristics across various NLP tasks, with the disparity rooted in pre-trained models.

Contribution

It shows that increasing model size and training length diminishes heuristic reliance, highlighting the importance of model scale and training in improving robustness.

Findings

01

Larger models are less susceptible to lexical overlap heuristics.

02

Longer training reduces reliance on superficial heuristics.

03

Disparity between model sizes originates from pre-trained models.

Abstract

While fine-tuned language models perform well on many tasks, they were also shown to rely on superficial surface features such as lexical overlap. Excessive utilization of such heuristics can lead to failure on challenging inputs. We analyze the use of lexical overlap heuristics in natural language inference, paraphrase detection, and reading comprehension (using a novel contrastive dataset), and find that larger models are much less susceptible to adopting lexical overlap heuristics. We also find that longer training leads models to abandon lexical overlap heuristics. Finally, we provide evidence that the disparity between models size has its source in the pre-trained model

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

elronbandel/lexical-generalization
noneOfficial

Datasets

biu-nlp/alsqa
dataset· 39 dl
39 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications