A Geometric Notion of Causal Probing
Cl\'ement Guerner, Tianyu Liu, Anej Svete, Alexander Warstadt, Ryan, Cotterell

TL;DR
This paper introduces an intrinsic, geometric, and information-theoretic approach to identify and manipulate concept subspaces in language models, demonstrating effective concept erasure and controlled generation without auxiliary classifiers.
Contribution
It proposes a set of intrinsic criteria for ideal linear concept subspaces, enabling their identification solely from model distributions, advancing understanding of concept encoding in language models.
Findings
Linear concept erasure effectively removes concept information.
Concept subspaces can be used for precise manipulation of generated words.
Framework accounts for spurious correlations in representation spaces.
Abstract
The linear subspace hypothesis (Bolukbasi et al., 2016) states that, in a language model's representation space, all information about a concept such as verbal number is encoded in a linear subspace. Prior work has relied on auxiliary classification tasks to identify and evaluate candidate subspaces that might give support for this hypothesis. We instead give a set of intrinsic criteria which characterize an ideal linear concept subspace and enable us to identify the subspace using only the language model distribution. Our information-theoretic framework accounts for spuriously correlated features in the representation space (Kumar et al., 2022) by reconciling the statistical notion of concept information and the geometric notion of how concepts are encoded in the representation space. As a byproduct of this analysis, we hypothesize a causal process for how a language model might…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPhilosophy and History of Science
