Matching with Text Data: An Experimental Evaluation of Methods for   Matching Documents and of Measuring Match Quality

Reagan Mozer; Luke Miratrix; Aaron Russell Kaufman; L. Jason; Anastasopoulos

arXiv:1801.00644·stat.ME·March 15, 2019

Matching with Text Data: An Experimental Evaluation of Methods for Matching Documents and of Measuring Match Quality

Reagan Mozer, Luke Miratrix, Aaron Russell Kaufman, L. Jason, Anastasopoulos

PDF

1 Repo

TL;DR

This paper systematically evaluates methods for matching text documents in causal inference, identifying approaches that improve match quality and developing a predictive model to estimate match quality based on human judgments.

Contribution

It introduces a framework for text matching, conducts a comprehensive evaluation of over 100 methods, and develops a predictive model for match quality assessment.

Findings

01

Certain methods outperform existing techniques in subjective match quality

02

A predictive model successfully mimics human judgment of match quality

03

Text matching improves causal inference in media bias and medical studies

Abstract

Matching for causal inference is a well-studied problem, but standard methods fail when the units to match are text documents: the high-dimensional and rich nature of the data renders exact matching infeasible, causes propensity scores to produce incomparable matches, and makes assessing match quality difficult. In this paper, we characterize a framework for matching text documents that decomposes existing methods into: (1) the choice of text representation, and (2) the choice of distance metric. We investigate how different choices within this framework affect both the quantity and quality of matches identified through a systematic multifactor evaluation experiment using human subjects. Altogether we evaluate over 100 unique text matching methods along with 5 comparison methods taken from the literature. Our experimental results identify methods that generate matches with higher…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

reaganmozer/textmatch
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsCausal inference