TL;DR
This paper systematically evaluates methods for matching text documents in causal inference, identifying approaches that improve match quality and developing a predictive model to estimate match quality based on human judgments.
Contribution
It introduces a framework for text matching, conducts a comprehensive evaluation of over 100 methods, and develops a predictive model for match quality assessment.
Findings
Certain methods outperform existing techniques in subjective match quality
A predictive model successfully mimics human judgment of match quality
Text matching improves causal inference in media bias and medical studies
Abstract
Matching for causal inference is a well-studied problem, but standard methods fail when the units to match are text documents: the high-dimensional and rich nature of the data renders exact matching infeasible, causes propensity scores to produce incomparable matches, and makes assessing match quality difficult. In this paper, we characterize a framework for matching text documents that decomposes existing methods into: (1) the choice of text representation, and (2) the choice of distance metric. We investigate how different choices within this framework affect both the quantity and quality of matches identified through a systematic multifactor evaluation experiment using human subjects. Altogether we evaluate over 100 unique text matching methods along with 5 comparison methods taken from the literature. Our experimental results identify methods that generate matches with higher…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsCausal inference
