Locating and Extracting Relational Concepts in Large Language Models
Zijian Wang, Britney White, Chang Xu

TL;DR
This paper uncovers hidden states in large language models that encode relational concepts, enabling their extraction and manipulation for improved interpretability and controllable fact recall.
Contribution
It identifies specific hidden states representing relational concepts in LLMs and demonstrates their utility for interpretability and controllable knowledge retrieval.
Findings
Hidden states at last token encode relational effects
Extracted relational representations are transferable
Relational representations enable controllable fact recall
Abstract
Relational concepts are indeed foundational to the structure of knowledge representation, as they facilitate the association between various entity concepts, allowing us to express and comprehend complex world knowledge. By expressing relational concepts in natural language prompts, people can effortlessly interact with large language models (LLMs) and recall desired factual knowledge. However, the process of knowledge recall lacks interpretability, and representations of relational concepts within LLMs remain unknown to us. In this paper, we identify hidden states that can express entity and relational concepts through causal mediation analysis in fact recall processes. Our finding reveals that at the last token position of the input prompt, there are hidden states that solely express the causal effects of relational concepts. Based on this finding, we assume that these hidden states…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling
