H-Probes: Extracting Hierarchical Structures From Latent Representations of Language Models
Cutter Dawes, Aryan Sharma, Angelos Ioannis Lagos, Shivam Raval

TL;DR
This paper introduces H-probes, a set of linear probes designed to extract and analyze hierarchical structures like depth and pairwise distance from language model representations, revealing how models encode hierarchy.
Contribution
The paper presents a novel probing method that uncovers hierarchical structures in language models' latent spaces, demonstrating their presence in synthetic and real-world reasoning tasks.
Findings
H-probes effectively identify hierarchical subspaces in synthetic tasks
Hierarchical structures are low-dimensional and causally important for task performance
Models encode hierarchy at multiple abstraction levels, including reasoning processes
Abstract
Representing and navigating hierarchy is a fundamental primitive of reasoning. Large language models have demonstrated proficiency in a wide variety of tasks requiring hierarchical reasoning, but there exists limited analysis on how the models geometrically represent the necessary latent constructions for such thinking. To this end, we develop H-probes, a collection of linear probes that extract hierarchical structure, specifically depth and pairwise distance, from latent representations. In synthetic tree traversal tasks, the H-probes robustly find the subspaces containing hierarchical structure necessary to complete the tasks; furthermore, in comprehensive ablation experiments, we show that these hierarchy-containing subspaces are low-dimensional, causally important for high task performance, and generalize within- and out-of-domain. Furthermore, we find analogous, though weaker,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
