Progressive Localisation in Localist LLMs
Joachim Diederich

TL;DR
This paper shows that progressive localization of attention in LLMs enhances interpretability without sacrificing performance, by gradually increasing attention locality from early to late layers using polynomial schedules.
Contribution
It introduces a novel progressive semantic localization approach with adaptive partitioning and steep schedules, balancing interpretability and performance in LLMs.
Findings
Progressive localization achieves near-baseline performance.
Steep polynomial schedules improve interpretability.
Flexible low-fidelity constraints preserve model capacity.
Abstract
This paper demonstrates that progressive localization, the gradual increase of attention locality from early distributed layers to late localized layers, represents the optimal architecture for creating interpretable large language models (LLMs) while preserving performance. Through systematic experimentation with GPT-2 fine-tuned on The Psychology of Artificial Superintelligence, we evaluate five locality configurations: two uniform baselines (fully distributed and fully localist) and three progressive polynomial schedules. We investigate whether interpretability constraints can be aligned with natural semantic structure while being applied strategically across network depth. We demonstrate that progressive semantic localization, combining adaptive semantic block partitioning with steep polynomial locality schedules, achieves near-baseline language modeling performance while providing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning · Artificial Intelligence in Healthcare and Education
