Position: Understanding LLMs Requires More Than Statistical Generalization
Patrik Reizinger, Szilvia Ujv\'ary, Anna M\'esz\'aros, Anna Kerekes,, Wieland Brendel, Ferenc Husz\'ar

TL;DR
This paper argues that understanding large language models (LLMs) requires more than statistical generalization, emphasizing the importance of model non-identifiability and its implications for various LLM capabilities.
Contribution
It introduces the concept that non-identifiability in probabilistic models explains several LLM behaviors, supported by mathematical examples and empirical case studies.
Findings
Non-identifiability explains zero-shot rule extrapolation.
In-context learning is approximately non-identifiable.
Fine-tunability also exhibits non-identifiability issues.
Abstract
The last decade has seen blossoming research in deep learning theory attempting to answer, "Why does deep learning generalize?" A powerful shift in perspective precipitated this progress: the study of overparametrized models in the interpolation regime. In this paper, we argue that another perspective shift is due, since some of the desirable qualities of LLMs are not a consequence of good statistical generalization and require a separate theoretical explanation. Our core argument relies on the observation that AR probabilistic models are inherently non-identifiable: models zero or near-zero KL divergence apart -- thus, equivalent test loss -- can exhibit markedly different behaviors. We support our position with mathematical examples and empirical observations, illustrating why non-identifiability has practical relevance through three case studies: (1) the non-identifiability of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Law
