Demystifying Statistical Matching Algorithms for Big Data

Sanjeewani Weerasingha; Michael J. Higgins

arXiv:2309.05859·stat.ME·September 13, 2023

Demystifying Statistical Matching Algorithms for Big Data

Sanjeewani Weerasingha, Michael J. Higgins

PDF

Open Access

TL;DR

This paper explains common algorithms for statistical matching in big data, emphasizing their efficiency and advocating for sparsity-based methods to improve scalability in large datasets.

Contribution

It provides a detailed overview of matching algorithms, especially without replacement, and discusses their scalability and the potential of sparsity to handle large data.

Findings

01

Matching algorithms vary in efficiency with data size

02

Sparsity can improve scalability of statistical matching

03

Detailed analysis of algorithms for big data applications

Abstract

Statistical matching is an effective method for estimating causal effects in which treated units are paired with control units with ``similar'' values of confounding covariates prior to performing estimation. In this way, matching helps isolate the effect of treatment on response from effects due to the confounding covariates. While there are a large number of software packages to perform statistical matching, the algorithms and techniques used to solve statistical matching problems -- especially matching without replacement -- are not widely understood. In this paper, we describe in detail commonly-used algorithms and techniques for solving statistical matching problems. We focus in particular on the efficiency of these algorithms as the number of observations grow large. We advocate for the further development of statistical matching methods that impose and exploit ``sparsity'' -- by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Causal Inference Techniques · Statistical Methods and Inference · Bayesian Modeling and Causal Inference