Causal Estimation for Text Data with (Apparent) Overlap Violations
Lin Gui, Victor Veitch

TL;DR
This paper introduces a method for causal effect estimation in text data that overcomes overlap violations by learning representations that retain confounding information but remove treatment-predictive features, enabling robust causal inference.
Contribution
It proposes a supervised representation learning approach to address overlap violations in causal estimation with text data, ensuring valid adjustment and uncertainty quantification.
Findings
Significant bias reduction compared to baseline methods
Improved uncertainty quantification in causal estimates
Robustness to outcome misestimation
Abstract
Consider the problem of estimating the causal effect of some attribute of a text document; for example: what effect does writing a polite vs. rude email have on response time? To estimate a causal effect from observational data, we need to adjust for confounding aspects of the text that affect both the treatment and outcome -- e.g., the topic or writing level of the text. These confounding aspects are unknown a priori, so it seems natural to adjust for the entirety of the text (e.g., using a transformer). However, causal identification and estimation procedures rely on the assumption of overlap: for all levels of the adjustment variables, there is randomness leftover so that every unit could have (not) received treatment. Since the treatment here is itself an attribute of the text, it is perfectly determined, and overlap is apparently violated. The purpose of this paper is to show how…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Causal Inference Techniques · Bayesian Modeling and Causal Inference
