Propensity Score Matching: Should We Use It in Designing Observational Studies?
Fei Wan

TL;DR
Propensity Score Matching (PSM) is widely used in observational studies to mimic randomized experiments, but recent findings suggest it can paradoxically increase imbalance and bias, which this paper clarifies and addresses.
Contribution
This paper clarifies the PSM paradox, demonstrating that common metrics misrepresent chance imbalance and that the paradox is not a valid concern, supporting continued use of PSM.
Findings
Matched pairs show covariate differences due to chance, averaging out over many pairs.
Common metrics reflect variability, not true imbalance, increasing with sample pruning.
Model uncertainty leads to biased estimates; matching reduces this bias, not increases it.
Abstract
Propensity Score Matching (PSM) stands as a widely embraced method in comparative effectiveness research. PSM crafts matched datasets, mimicking some attributes of randomized designs, from observational data. In a valid PSM design where all baseline confounders are measured and matched, the confounders would be balanced, allowing the treatment status to be considered as if it were randomly assigned. Nevertheless, recent research has unveiled a different facet of PSM, termed "the PSM paradox." As PSM approaches exact matching by progressively pruning matched sets in order of decreasing propensity score distance, it can paradoxically lead to greater covariate imbalance, heightened model dependence, and increased bias, contrary to its intended purpose. Methods: We used analytic formula, simulation, and literature to demonstrate that this paradox stems from the misuse of metrics for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Causal Inference Techniques
