Unlabeled Data Help in Graph-Based Semi-Supervised Learning: A Bayesian   Nonparametrics Perspective

Daniel Sanz-Alonso; Ruiyi Yang

arXiv:2008.11809·math.ST·June 15, 2021·J. Mach. Learn. Res.·5 cites

Unlabeled Data Help in Graph-Based Semi-Supervised Learning: A Bayesian Nonparametrics Perspective

Daniel Sanz-Alonso, Ruiyi Yang

PDF

Open Access

TL;DR

This paper provides a Bayesian analysis of graph-based semi-supervised learning, showing that with enough unlabeled data, the posterior concentrates near the true function at near-optimal rates for both regression and classification.

Contribution

It introduces a Bayesian framework that explains how unlabeled data improve semi-supervised learning, achieving near-minimax optimal convergence rates.

Findings

01

Posterior contracts around the true function at near-minimax rates.

02

The theory applies to both regression and classification tasks.

03

Unlabeled data significantly enhance learning performance under the Bayesian approach.

Abstract

In this paper we analyze the graph-based approach to semi-supervised learning under a manifold assumption. We adopt a Bayesian perspective and demonstrate that, for a suitable choice of prior constructed with sufficiently many unlabeled data, the posterior contracts around the truth at a rate that is minimax optimal up to a logarithmic factor. Our theory covers both regression and classification.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Statistical Methods and Inference · Bayesian Methods and Mixture Models