Linear Representations of Hierarchical Concepts in Language Models
Masaki Sakata, Benjamin Heinzerling, Takumi Ito, Sho Yokoi, Kentaro Inui

TL;DR
This paper explores how language models encode hierarchical relations using linear transformations, revealing that such hierarchies are represented in low-dimensional, domain-specific subspaces that are highly interpretable.
Contribution
The study introduces a method to identify linear transformations for hierarchical relations, extending prior work to multi-token entities and cross-layer representations, and demonstrates their effectiveness across domains.
Findings
Hierarchical relations can be linearly recovered from model representations.
Hierarchical information is encoded in a low-dimensional, domain-specific subspace.
Hierarchy representations are highly similar across different domain-specific subspaces.
Abstract
We investigate how and to what extent hierarchical relations (e.g., Japan Eastern Asia Asia) are encoded in the internal representations of language models. Building on Linear Relational Concepts, we train linear transformations specific to each hierarchical depth and semantic domain, and characterize representational differences associated with hierarchical relations by comparing these transformations. Going beyond prior work on the representational geometry of hierarchies in LMs, our analysis covers multi-token entities and cross-layer representations. Across multiple domains we learn such transformations and evaluate in-domain generalization to unseen data and cross-domain transfer. Experiments show that, within a domain, hierarchical relations can be linearly recovered from model representations. We then analyze how hierarchical information is encoded in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
