Detecting and Mitigating Treatment Leakage in Text-Based Causal Inference: Distillation and Sensitivity Analysis
Adel Daoud, Richard Johansson, Connor T. Jerzak

TL;DR
This paper identifies the problem of treatment leakage in text-based causal inference, formalizes its definition, and proposes four distillation methods to mitigate bias while maintaining confounder information, validated through simulations and empirical data.
Contribution
It introduces formal definitions of treatment leakage and presents four novel text distillation techniques to reduce bias in text-based causal inference.
Findings
Moderate distillation balances bias reduction and confounder retention.
Overly strict distillation reduces estimate precision.
Methods improve causal estimate accuracy in empirical application.
Abstract
Text-based causal inference increasingly employs textual data as proxies for unobserved confounders, yet this approach introduces a previously undertheorized source of bias: treatment leakage. Treatment leakage occurs when text intended to capture confounding information also contains signals predictive of treatment status, thereby inducing post-treatment bias in causal estimates. Critically, this problem can arise even when documents precede treatment assignment, as authors may employ future-referencing language that anticipates subsequent interventions. Despite growing recognition of this issue, no systematic methods exist for identifying and mitigating treatment leakage in text-as-confounder applications. This paper addresses this gap through three contributions. First, we provide formal statistical and set-theoretic definitions of treatment leakage that clarify when and why bias…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Causal Inference Techniques · Qualitative Comparative Analysis Research · Global Financial Crisis and Policies
