Text Rationalization for Robust Causal Effect Estimation
Lijinghua Zhang, Hengrui Cai

TL;DR
This paper introduces CATR, a framework that selects essential tokens from text data to improve the accuracy and stability of causal effect estimation in natural language processing, especially when dealing with high-dimensional text features.
Contribution
The paper proposes a novel confounding-aware token rationalization method that reduces textual feature dimensionality while preserving confounding information for causal inference.
Findings
CATR improves causal effect estimate stability.
It reduces the impact of positivity violations.
Demonstrates superior performance on synthetic and real data.
Abstract
Recent advances in natural language processing have enabled the increasing use of text data in causal inference, particularly for adjusting confounding factors in treatment effect estimation. Although high-dimensional text can encode rich contextual information, it also poses unique challenges for causal identification and estimation. In particular, the positivity assumption, which requires sufficient treatment overlap across confounder values, is often violated at the observational level, when massive text is represented in feature spaces. Redundant or spurious textual features inflate dimensionality, producing extreme propensity scores, unstable weights, and inflated variance in effect estimates. We address these challenges with Confounding-Aware Token Rationalization (CATR), a framework that selects a sparse necessary subset of tokens using a residual-independence diagnostic designed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Causal Inference Techniques · Bayesian Modeling and Causal Inference · Explainable Artificial Intelligence (XAI)
