Geospatial Mechanistic Interpretability of Large Language Models
Stef De Sabbata, Stefano Mizzaro, Kevin Roitero

TL;DR
This paper introduces a novel geospatial mechanistic interpretability framework for Large Language Models, using spatial analysis to understand how these models internally process geographical information and representations.
Contribution
It develops a new framework combining spatial analysis and interpretability techniques to analyze LLMs' handling of geographical data, advancing understanding of their internal representations.
Findings
Features for placenames display spatial autocorrelation.
Spatial patterns in internal features relate to geographic locations.
Framework aids in understanding LLMs' processing of geographical information.
Abstract
Large Language Models (LLMs) have demonstrated unprecedented capabilities across various natural language processing tasks. Their ability to process and generate viable text and code has made them ubiquitous in many fields, while their deployment as knowledge bases and "reasoning" tools remains an area of ongoing research. In geography, a growing body of literature has been focusing on evaluating LLMs' geographical knowledge and their ability to perform spatial reasoning. However, very little is still known about the internal functioning of these models, especially about how they process geographical information. In this chapter, we establish a novel framework for the study of geospatial mechanistic interpretability - using spatial analysis to reverse engineer how LLMs handle geographical information. Our aim is to advance our understanding of the internal representations that these…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Semantic Web and Ontologies
MethodsSparse Autoencoder
