A Gauge Theory of Superposition: Toward a Sheaf-Theoretic Atlas of Neural Representations
Hossein Javidnia

TL;DR
This paper introduces a sheaf-theoretic gauge framework for analyzing superposition in large language models, revealing measurable obstructions to interpretability and providing bounds on interference and jamming.
Contribution
It develops a novel sheaf-theoretic gauge-theoretic framework for LLMs, enabling local semantic analysis and quantification of interpretability obstructions.
Findings
Holonomy is computable and gauge-invariant after gauge fixing.
Shearing bounds transfer mismatch energy, indicating failure modes.
Certified bounds on jamming and interference are achieved with high coverage.
Abstract
We develop a discrete gauge-theoretic framework for superposition in large language models (LLMs) that replaces the single-global-dictionary premise with a sheaf-theoretic atlas of local semantic charts. Contexts are clustered into a stratified context complex; each chart carries a local feature space and a local information-geometric metric (Fisher/Gauss-Newton) identifying predictively consequential feature interactions. This yields a Fisher-weighted interference energy and three measurable obstructions to global interpretability: (O1) local jamming (active load exceeds Fisher bandwidth), (O2) proxy shearing (mismatch between geometric transport and a fixed correspondence proxy), and (O3) nontrivial holonomy (path-dependent transport around loops). We prove and instantiate four results on a frozen open LLM (Llama-3.2-3B Instruct) using WikiText-103, a C4-derived English web-text…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Topic Modeling · Generative Adversarial Networks and Image Synthesis
