Gradient-Boosted Pseudo-Weighting: Methods for Population Inference from Nonprobability samples

Kangrui Liu; Lingxiao Wang; Yan Li

arXiv:2508.00089·stat.ME·August 8, 2025

Gradient-Boosted Pseudo-Weighting: Methods for Population Inference from Nonprobability samples

Kangrui Liu, Lingxiao Wang, Yan Li

PDF

TL;DR

This paper introduces gradient-boosted pseudo-weighting methods to improve population inference from nonprobability samples, addressing bias and model flexibility issues in propensity score adjustments.

Contribution

It proposes using gradient boosting machines for estimating propensity scores in pseudo-weighting, enhancing flexibility over traditional logistic regression methods.

Findings

01

Gradient-boosted pseudo-weights outperform existing methods in simulations.

02

The approach reduces bias in population mean estimates.

03

Effective in estimating health outcome prevalences.

Abstract

Nonprobability samples have rapidly emerged to address time-sensitive priority topics in a variety of fields. While these data are timely, they are prone to selection bias. To mitigate selection bias, a large number of survey research literature has explored the use of propensity score (PS) adjustment methods to enhance population representativeness of nonprobability samples, using probability-based survey samples as external references. A recent advancement, the 2-step PS-based pseudo-weighting adjustment method (2PS, Li 2024), has been shown to improve upon recent developments with respect to mean squared error. However, the effectiveness of these methods in reducing bias critically depends on the ability of the underlying propensity model to accurately reflect the true selection process, which is challenging with parametric regression. In this study, we propose a set of pseudo-weight…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.