A Unified Framework for Semiparametrically Efficient Semi-Supervised   Learning

Zichun Xu; Daniela Witten; Ali Shojaie

arXiv:2502.17741·math.ST·March 20, 2025

A Unified Framework for Semiparametrically Efficient Semi-Supervised Learning

Zichun Xu, Daniela Witten, Ali Shojaie

PDF

Open Access

TL;DR

This paper develops a unified semiparametric efficiency framework for semi-supervised learning, proposing estimators that leverage unlabeled data to improve inference, with theoretical guarantees and applications to various statistical problems.

Contribution

It introduces a general efficiency theory for semi-supervised inference, proposing safe and efficient estimators that incorporate unlabeled data and machine learning predictions.

Findings

01

The safe estimator is at least as efficient as supervised methods.

02

The efficient estimator achieves the semiparametric efficiency bound.

03

Simulations demonstrate improved performance of the proposed estimators.

Abstract

We consider statistical inference under a semi-supervised setting where we have access to both a labeled dataset consisting of pairs ${X_{i}, Y_{i}}_{i = 1}^{n}$ and an unlabeled dataset ${X_{i}}_{i = n + 1}^{n + N}$ . We ask the question: under what circumstances, and by how much, can incorporating the unlabeled dataset improve upon inference using the labeled data? To answer this question, we investigate semi-supervised learning through the lens of semiparametric efficiency theory. We characterize the efficiency lower bound under the semi-supervised setting for an arbitrary inferential problem, and show that incorporating unlabeled data can potentially improve efficiency if the parameter is not well-specified. We then propose two types of semi-supervised estimators: a safe estimator that imposes minimal assumptions, is simple to compute, and is guaranteed to be at least as efficient as the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace and Expression Recognition · Advanced Data Compression Techniques · Gaussian Processes and Bayesian Inference