Sparse Semantic Dimension as a Generalization Certificate for LLMs
Dibyanayan Bandyopadhyay, Asif Ekbal

TL;DR
This paper introduces the Sparse Semantic Dimension (SSD), a new complexity measure based on sparse internal representations of LLMs, providing insights into their generalization, compressibility, and safety monitoring.
Contribution
The paper formalizes SSD as a measure of low-dimensional, sparse semantic features in LLMs, linking representation geometry to generalization and safety.
Findings
SSD provides non-vacuous generalization certificates at realistic sample sizes.
Larger models like Gemma-2B learn more compressible, distinct semantic structures.
Out-of-distribution inputs cause a spike in active features, signaling uncertainty.
Abstract
Standard statistical learning theory predicts that Large Language Models (LLMs) should overfit because their parameter counts vastly exceed the number of training tokens. Yet, in practice, they generalize robustly. We propose that the effective capacity controlling generalization lies in the geometry of the model's internal representations: while the parameter space is high-dimensional, the activation states lie on a low-dimensional, sparse manifold. To formalize this, we introduce the Sparse Semantic Dimension (SSD), a complexity measure derived from the active feature vocabulary of a Sparse Autoencoder (SAE) trained on the model's layers. Treating the LLM and SAE as frozen oracles, we utilize this framework to attribute the model's generalization capabilities to the sparsity of the dictionary rather than the total parameter count. Empirically, we validate this framework on GPT-2 Small…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Artificial Intelligence in Healthcare and Education · Explainable Artificial Intelligence (XAI)
