Generalization vs. Memorization in the Presence of Statistical Biases in   Transformers

John Mitros

arXiv:2409.04654·cs.LG·September 11, 2024

Generalization vs. Memorization in the Presence of Statistical Biases in Transformers

John Mitros

PDF

Open Access

TL;DR

This paper investigates how statistical biases influence transformer models' ability to generalize, revealing that reliance on spurious correlations leads to overestimated performance, especially on out-of-distribution data.

Contribution

It systematically evaluates the impact of statistical biases on transformers' generalization across synthetic tasks and analyzes model components' roles in this process.

Findings

01

Biases impair out-of-distribution performance

02

Transformers rely heavily on spurious correlations

03

Biases lead to overestimation of generalization capabilities

Abstract

This study aims to understand how statistical biases affect the model's ability to generalize to in-distribution and out-of-distribution data on algorithmic tasks. Prior research indicates that transformers may inadvertently learn to rely on these spurious correlations, leading to an overestimation of their generalization capabilities. To investigate this, we evaluate transformer models on several synthetic algorithmic tasks, systematically introducing and varying the presence of these biases. We also analyze how different components of the transformer models impact their generalization. Our findings suggest that statistical biases impair the model's performance on out-of-distribution data, providing a overestimation of its generalization capabilities. The models rely heavily on these spurious correlations for inference, as indicated by their performance on tasks including such biases.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications