It Ain't That Bad: Understanding the Mysterious Performance Drop in OOD   Generalization for Generative Transformer Models

Xingcheng Xu; Zihao Pan; Haipeng Zhang; Yanqing Yang

arXiv:2308.08268·cs.LG·August 20, 2024

It Ain't That Bad: Understanding the Mysterious Performance Drop in OOD Generalization for Generative Transformer Models

Xingcheng Xu, Zihao Pan, Haipeng Zhang, Yanqing Yang

PDF

Open Access 1 Repo

TL;DR

This paper investigates why large language models perform poorly on out-of-distribution inputs, revealing that they learn algebraic structures that enable some form of systematic generalization despite apparent failures.

Contribution

The study uncovers that models generalize successfully within training distributions due to structured representations, and introduces the concept of equivalence generalization for OOD inputs.

Findings

01

Models exhibit structured algebraic representations.

02

Models map OOD inputs to ID-equivalent outputs.

03

Strong ID generalization is due to learned algebraic structures.

Abstract

Large language models (LLMs) have achieved remarkable proficiency on solving diverse problems. However, their generalization ability is not always satisfying and the generalization problem is common for generative transformer models in general. Researchers take basic mathematical tasks like n-digit addition or multiplication as important perspectives for investigating their generalization behaviors. It is observed that when training models on n-digit operations (e.g., additions) in which both input operands are n-digit in length, models generalize successfully on unseen n-digit inputs (in-distribution (ID) generalization), but fail miserably on longer, unseen cases (out-of-distribution (OOD) generalization). We bring this unexplained performance drop into attention and ask whether there is systematic OOD generalization. Towards understanding LLMs, we train various smaller language…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

xingchengxu/exploregpt
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsModel Reduction and Neural Networks · Neural Networks and Applications · Computational Physics and Python Applications

Methodsfail