Counterfactual reasoning: an analysis of in-context emergence
Moritz Miller, Bernhard Sch\"olkopf, Siyuan Guo

TL;DR
This paper investigates how large language models perform in in-context counterfactual reasoning, demonstrating their ability to predict hypothetical outcomes and identifying mechanisms behind this capability.
Contribution
It provides a detailed analysis of counterfactual reasoning in language models, introduces noise abduction heads, and extends understanding to sequential data and SDE dynamics.
Findings
Language models can perform counterfactual reasoning in synthetic tasks.
Self-attention, depth, and data diversity influence performance.
Latent concepts are linearly represented in residual streams.
Abstract
Large-scale neural language models exhibit remarkable performance in in-context learning: the ability to learn and reason about the input context on the fly. This work studies in-context counterfactual reasoning in language models, that is, the ability to predict consequences of a hypothetical scenario. We focus on a well-defined, synthetic linear regression task that requires noise abduction. Accurate prediction is based on (1) inferring an unobserved latent concept and (2) copying contextual noise from factual observations. We show that language models are capable of counterfactual reasoning. Further, we enhance existing identifiability results and reduce counterfactual reasoning for a broad class of functions to a transformation on in-context observations. In Transformers, we find that self-attention, model depth and pre-training data diversity drive performance. Moreover, we provide…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Multimodal Machine Learning Applications
MethodsLinear Regression · Focus
