Demystifying Statistical Matching Algorithms for Big Data
Sanjeewani Weerasingha, Michael J. Higgins

TL;DR
This paper explains common algorithms for statistical matching in big data, emphasizing their efficiency and advocating for sparsity-based methods to improve scalability in large datasets.
Contribution
It provides a detailed overview of matching algorithms, especially without replacement, and discusses their scalability and the potential of sparsity to handle large data.
Findings
Matching algorithms vary in efficiency with data size
Sparsity can improve scalability of statistical matching
Detailed analysis of algorithms for big data applications
Abstract
Statistical matching is an effective method for estimating causal effects in which treated units are paired with control units with ``similar'' values of confounding covariates prior to performing estimation. In this way, matching helps isolate the effect of treatment on response from effects due to the confounding covariates. While there are a large number of software packages to perform statistical matching, the algorithms and techniques used to solve statistical matching problems -- especially matching without replacement -- are not widely understood. In this paper, we describe in detail commonly-used algorithms and techniques for solving statistical matching problems. We focus in particular on the efficiency of these algorithms as the number of observations grow large. We advocate for the further development of statistical matching methods that impose and exploit ``sparsity'' -- by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Causal Inference Techniques · Statistical Methods and Inference · Bayesian Modeling and Causal Inference
