Optimal Sampling for Generalized Linear Model under Measurement   Constraint with Surrogate Variables

Yixin Shen; Yang Ning

arXiv:2501.00972·stat.ME·January 15, 2025

Optimal Sampling for Generalized Linear Model under Measurement Constraint with Surrogate Variables

Yixin Shen, Yang Ning

PDF

Open Access

TL;DR

This paper introduces an optimal sampling method for generalized linear models that leverages surrogate variables with measurement errors, improving estimation efficiency under data labeling constraints.

Contribution

It develops a novel sampling strategy using surrogate variables and A-optimality, achieving lower asymptotic variance than existing methods without surrogates.

Findings

01

Outperforms existing sampling algorithms in empirical mean squared error

02

Provides consistent estimators under measurement constraints

03

Enhances robustness in practical scenarios

Abstract

Measurement-constrained datasets, often encountered in semi-supervised learning, arise when data labeling is costly, time-intensive, or hindered by confidentiality or ethical concerns, resulting in a scarcity of labeled data. In certain cases, surrogate variables are accessible across the entire dataset and can serve as approximations to the true response variable; however, these surrogates often contain measurement errors and thus cannot be directly used for accurate prediction. We propose an optimal sampling strategy that effectively harnesses the available information from surrogate variables. This approach provides consistent estimators under the assumption of a generalized linear model, achieving theoretically lower asymptotic variance than existing optimal sampling algorithms that do not use surrogate data information. By employing the A-optimality criterion from optimal…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Statistical Methods and Models · Scientific Measurement and Uncertainty Evaluation · Advanced Statistical Process Monitoring