Adapting Text Embeddings for Causal Inference
Victor Veitch, Dhanya Sridhar, David M. Blei

TL;DR
This paper introduces causally sufficient text embeddings that enable accurate causal inference from observational text data by reducing dimensionality while preserving causal information, demonstrated through real-world examples.
Contribution
It develops a novel method combining supervised dimensionality reduction and language modeling to create low-dimensional embeddings suitable for causal analysis.
Findings
Causally sufficient embeddings improve causal effect estimation.
The method effectively adjusts for confounding in observational text data.
Applications include assessing the impact of a theorem on paper acceptance and gender labels on post popularity.
Abstract
Does adding a theorem to a paper affect its chance of acceptance? Does labeling a post with the author's gender affect the post popularity? This paper develops a method to estimate such causal effects from observational text data, adjusting for confounding features of the text such as the subject or writing quality. We assume that the text suffices for causal adjustment but that, in practice, it is prohibitively high-dimensional. To address this challenge, we develop causally sufficient embeddings, low-dimensional document representations that preserve sufficient information for causal identification and allow for efficient estimation of causal effects. Causally sufficient embeddings combine two ideas. The first is supervised dimensionality reduction: causal adjustment requires only the aspects of text that are predictive of both the treatment and outcome. The second is efficient…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Bayesian Modeling and Causal Inference · Data Quality and Management
