Causality $\neq$ Invariance: Function and Concept Vectors in LLMs
Gustaw Opie{\l}ka, Hannes Rosenbusch, Claire E. Stevenson

TL;DR
This paper investigates how large language models represent concepts, revealing that Function Vectors are input-format dependent while Concept Vectors offer more stable, cross-format representations, with implications for model understanding and generalization.
Contribution
The study introduces Concept Vectors as a more stable concept representation in LLMs, contrasting with Function Vectors, and analyzes their emergence, differences, and generalization capabilities.
Findings
FVs are nearly orthogonal across different input formats.
CVs are more stable and generalize better out-of-distribution.
FVs excel in in-distribution, format-matched scenarios.
Abstract
Do large language models (LLMs) represent concepts abstractly, i.e., independent of input format? We revisit Function Vectors (FVs), compact representations of in-context learning (ICL) tasks that causally drive task performance. Across multiple LLMs, we show that FVs are not fully invariant: FVs are nearly orthogonal when extracted from different input formats (e.g., open-ended vs. multiple-choice), even if both target the same concept. We identify Concept Vectors (CVs), which carry more stable concept representations. Like FVs, CVs are composed of attention head outputs; however, unlike FVs, the constituent heads are selected using Representational Similarity Analysis (RSA) based on whether they encode concepts consistently across input formats. While these heads emerge in similar layers to FV-related heads, the two sets are largely distinct, suggesting different underlying…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI)
