Conceptualizing Treatment Leakage in Text-based Causal Inference
Adel Daoud, Connor T. Jerzak, Richard Johansson

TL;DR
This paper explores the problem of treatment leakage in text-based causal inference, demonstrating how it biases treatment effect estimates and proposing a text distillation method to mitigate this issue.
Contribution
It formally defines treatment leakage in text-based causal inference, analyzes its impact, and introduces a pre-processing technique called text distillation to address the bias.
Findings
Treatment leakage causes bias in ATE estimates.
Text distillation reduces bias caused by treatment leakage.
Simulation results confirm effectiveness of the proposed method.
Abstract
Causal inference methods that control for text-based confounders are becoming increasingly important in the social sciences and other disciplines where text is readily available. However, these methods rely on a critical assumption that there is no treatment leakage: that is, the text only contains information about the confounder and no information about treatment assignment. When this assumption does not hold, methods that control for text to adjust for confounders face the problem of post-treatment (collider) bias. However, the assumption that there is no treatment leakage may be unrealistic in real-world situations involving text, as human language is rich and flexible. Language appearing in a public policy document or health records may refer to the future and the past simultaneously, and thereby reveal information about the treatment assignment. In this article, we define the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Causal Inference Techniques
