Multi-class Probabilistic Bounds for Self-learning

Vasilii Feofanov; Emilie Devijver; Massih-Reza Amini

arXiv:2109.14422·cs.LG·September 30, 2021

Multi-class Probabilistic Bounds for Self-learning

Vasilii Feofanov, Emilie Devijver, Massih-Reza Amini

PDF

Open Access

TL;DR

This paper introduces a probabilistic framework for analyzing multi-class self-learning, providing bounds on classifier risk and error, and proposes an automatic threshold selection method to improve semi-supervised learning.

Contribution

It offers the first probabilistic bounds for multi-class self-learning and introduces a method to optimize pseudo-labeling thresholds based on these bounds.

Findings

01

The framework effectively bounds the risk of the majority vote classifier.

02

Automatic threshold selection improves pseudo-labeling accuracy.

03

Empirical results outperform several state-of-the-art semi-supervised methods.

Abstract

Self-learning is a classical approach for learning with both labeled and unlabeled observations which consists in giving pseudo-labels to unlabeled training instances with a confidence score over a predetermined threshold. At the same time, the pseudo-labeling technique is prone to error and runs the risk of adding noisy labels into unlabeled training data. In this paper, we present a probabilistic framework for analyzing self-learning in the multi-class classification scenario with partially labeled data. First, we derive a transductive bound over the risk of the multi-class majority vote classifier. Based on this result, we propose to automatically choose the threshold for pseudo-labeling that minimizes the transductive bound. Then, we introduce a mislabeling error model to analyze the error of the majority vote classifier in the case of the pseudo-labeled data. We derive a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Domain Adaptation and Few-Shot Learning · Machine Learning and Algorithms

MethodsSelf-Learning