Do Language Models Encode Semantic Relations? Probing and Sparse Feature Analysis
Andor Diera, Ansgar Scherp

TL;DR
This paper investigates how large language models encode semantic relations like synonymy and hypernymy, revealing differences in their internal representations and the stability of these signals across model sizes and layers.
Contribution
It combines probing and interpretability techniques to identify where and how semantic relations are represented inside LLMs, highlighting capacity-dependent causal effects.
Findings
Hypernymy is redundantly encoded and resistant to suppression.
Hyponymy relies on compact features that are easily disrupted.
Semantic relation signals peak in mid-layers and are stronger in post-residual pathways.
Abstract
Understanding whether large language models (LLMs) capture structured meaning requires examining how they represent concept relationships. In this work, we study three models of increasing scale: Pythia-70M, GPT-2, and Llama 3.1 8B, focusing on four semantic relations: synonymy, antonymy, hypernymy, and hyponymy. We combine linear probing with mechanistic interpretability techniques, including sparse autoencoders (SAE) and activation patching, to identify where these relations are encoded and how specific features contribute to their representation. Our results reveal a directional asymmetry in hierarchical relations: hypernymy is encoded redundantly and resists suppression, while hyponymy relies on compact features that are more easily disrupted by ablation. More broadly, relation signals are diffuse but exhibit stable profiles: they peak in the mid-layers and are stronger in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
