Tracing Relational Knowledge Recall in Large Language Models
Nicholas Popovi\v{c}, Michael F\"arber

TL;DR
This paper investigates how large language models recall relational knowledge, identifying which internal representations support relation classification and analyzing factors influencing their linear separability.
Contribution
It systematically evaluates latent representations from attention and MLPs, revealing attention contributions as strong features for relation classification and analyzing factors affecting probe accuracy.
Findings
Attention head contributions are strong features for relation classification.
Probe accuracy correlates with relation specificity and entity connectedness.
Token-level attribution reveals detailed probe behavior.
Abstract
We study how large language models recall relational knowledge during text generation, with a focus on identifying latent representations suitable for relation classification via linear probes. Prior work shows how attention heads and MLPs interact to resolve subject, predicate, and object, but it remains unclear which representations support faithful linear relation classification and why some relation types are easier to capture linearly than others. We systematically evaluate different latent representations derived from attention head and MLP contributions, showing that per-head attention contributions to the residual stream are comparatively strong features for linear relation classification. Feature attribution analyses of the trained probes, as well as characteristics of the different relation types, reveal clear correlations between probe accuracy and relation specificity,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
