Position: Understanding LLMs Requires More Than Statistical   Generalization

Patrik Reizinger; Szilvia Ujv\'ary; Anna M\'esz\'aros; Anna Kerekes,; Wieland Brendel; Ferenc Husz\'ar

arXiv:2405.01964·stat.ML·June 18, 2024·2 cites

Position: Understanding LLMs Requires More Than Statistical Generalization

Patrik Reizinger, Szilvia Ujv\'ary, Anna M\'esz\'aros, Anna Kerekes,, Wieland Brendel, Ferenc Husz\'ar

PDF

Open Access 1 Repo

TL;DR

This paper argues that understanding large language models (LLMs) requires more than statistical generalization, emphasizing the importance of model non-identifiability and its implications for various LLM capabilities.

Contribution

It introduces the concept that non-identifiability in probabilistic models explains several LLM behaviors, supported by mathematical examples and empirical case studies.

Findings

01

Non-identifiability explains zero-shot rule extrapolation.

02

In-context learning is approximately non-identifiable.

03

Fine-tunability also exhibits non-identifiability issues.

Abstract

The last decade has seen blossoming research in deep learning theory attempting to answer, "Why does deep learning generalize?" A powerful shift in perspective precipitated this progress: the study of overparametrized models in the interpolation regime. In this paper, we argue that another perspective shift is due, since some of the desirable qualities of LLMs are not a consequence of good statistical generalization and require a separate theoretical explanation. Our core argument relies on the observation that AR probabilistic models are inherently non-identifiable: models zero or near-zero KL divergence apart -- thus, equivalent test loss -- can exhibit markedly different behaviors. We support our position with mathematical examples and empirical observations, illustrating why non-identifiability has practical relevance through three case studies: (1) the non-identifiability of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

rpatrik96/llm-non-identifiability
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Law