Polar probe linearly decodes semantic structures from LLMs
Pablo J. Diego-Sim\'on, Pierre Orhan, Emmanuel Chemla, Yair Lakretz, Jean-R\'emi King

TL;DR
This paper demonstrates that large language models encode semantic structures in a geometric form, which can be linearly decoded using a simple polar coordinate-based probe, revealing insights into their internal representations.
Contribution
The study introduces a polar probe method to linearly decode semantic relations from LLMs, showing the emergence and generalization of this geometric code across layers and tasks.
Findings
Semantic relations are linearly recoverable in LLMs using polar coordinates.
The geometric code emerges mainly in middle layers and correlates with task performance.
The polar representation generalizes to new entities and relation types, but degrades with larger structures.
Abstract
How do artificial neural networks bind concepts to form complex semantic structures? Here, we propose a simple neural code, whereby the existence and the type of relations between entities are represented by the distance and the direction between their embeddings, respectively. We test this hypothesis in a variety of Large Language Models (LLMs), each input with natural-language descriptions of minimalist tasks from five different domains: arithmetic, visual scenes, family trees, metro maps and social interactions. Results show that the true semantic structures can be linearly recovered with a Polar Probe targeting a subspace of LLMs' layer activations. Second, this code emerges mostly in middle layers and improves with LLM performance. Third, these Polar Probes successfully generalize to new entities and relation types, but degrades with the size of the semantic structure. Finally, the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
