Latent Causal Probing: A Formal Perspective on Probing with Causal Models of Data
Charles Jin, Martin Rinard

TL;DR
This paper introduces a formal causal framework for probing language models, enabling more robust analysis of whether models learn and represent underlying latent causal structures in data.
Contribution
It develops a structural causal model perspective on probing, providing a formal basis and empirical methods to assess if LMs capture latent causal variables.
Findings
Probes can reliably indicate latent causal concept learning in LMs.
Empirical evidence shows LMs induce underlying causal structures in synthetic tasks.
The framework improves robustness and interpretability of probing results.
Abstract
As language models (LMs) deliver increasing performance on a range of NLP tasks, probing classifiers have become an indispensable technique in the effort to better understand their inner workings. A typical setup involves (1) defining an auxiliary task consisting of a dataset of text annotated with labels, then (2) supervising small classifiers to predict the labels from the representations of a pretrained LM as it processed the dataset. A high probing accuracy is interpreted as evidence that the LM has learned to perform the auxiliary task as an unsupervised byproduct of its original pretraining objective. Despite the widespread usage of probes, however, the robust design and analysis of probing experiments remains a challenge. We develop a formal perspective on probing using structural causal models (SCM). Specifically, given an SCM which explains the distribution of tokens observed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies · Business Process Modeling and Analysis · Data Quality and Management
