Latent Causal Probing: A Formal Perspective on Probing with Causal   Models of Data

Charles Jin; Martin Rinard

arXiv:2407.13765·cs.CL·August 1, 2024·1 cites

Latent Causal Probing: A Formal Perspective on Probing with Causal Models of Data

Charles Jin, Martin Rinard

PDF

Open Access 1 Repo

TL;DR

This paper introduces a formal causal framework for probing language models, enabling more robust analysis of whether models learn and represent underlying latent causal structures in data.

Contribution

It develops a structural causal model perspective on probing, providing a formal basis and empirical methods to assess if LMs capture latent causal variables.

Findings

01

Probes can reliably indicate latent causal concept learning in LMs.

02

Empirical evidence shows LMs induce underlying causal structures in synthetic tasks.

03

The framework improves robustness and interpretability of probing results.

Abstract

As language models (LMs) deliver increasing performance on a range of NLP tasks, probing classifiers have become an indispensable technique in the effort to better understand their inner workings. A typical setup involves (1) defining an auxiliary task consisting of a dataset of text annotated with labels, then (2) supervising small classifiers to predict the labels from the representations of a pretrained LM as it processed the dataset. A high probing accuracy is interpreted as evidence that the LM has learned to perform the auxiliary task as an unsupervised byproduct of its original pretraining objective. Despite the widespread usage of probes, however, the robust design and analysis of probing experiments remains a challenge. We develop a formal perspective on probing using structural causal models (SCM). Specifically, given an SCM which explains the distribution of tokens observed…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

charlesjin/emergent-semantics
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSemantic Web and Ontologies · Business Process Modeling and Analysis · Data Quality and Management