Inference from Non-Random Samples Using Bayesian Machine Learning
Yutao Liu, Andrew Gelman, Qixuan Chen

TL;DR
This paper introduces a Bayesian machine learning approach for inference from non-random samples, leveraging auxiliary variables and propensity scores to improve population estimates with valid uncertainty quantification.
Contribution
It develops a regularized prediction method using Bayesian additive regression trees that accounts for non-random sampling and incorporates propensity scores for better inference.
Findings
Valid population mean inference achieved in simulations
Coverage rates close to nominal levels
Effective application demonstrated in survey and epidemiology data
Abstract
We consider inference from non-random samples in data-rich settings where high-dimensional auxiliary information is available both in the sample and the target population, with survey inference being a special case. We propose a regularized prediction approach that predicts the outcomes in the population using a large number of auxiliary variables such that the ignorability assumption is reasonable while the Bayesian framework is straightforward for quantification of uncertainty. Besides the auxiliary variables, inspired by Little & An (2004), we also extend the approach by estimating the propensity score for a unit to be included in the sample and also including it as a predictor in the machine learning models. We show through simulation studies that the regularized predictions using soft Bayesian additive regression trees yield valid inference for the population means and coverage…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Bayesian Inference · Statistical Methods and Inference · Advanced Causal Inference Techniques
