Semi-Supervised Learning with Multiple Imputations on Non-Random Missing   Labels

Jason Lu; Michael Ma; Huaze Xu; Zixi Xu

arXiv:2308.07562·cs.LG·August 16, 2023

Semi-Supervised Learning with Multiple Imputations on Non-Random Missing Labels

Jason Lu, Michael Ma, Huaze Xu, Zixi Xu

PDF

Open Access

TL;DR

This paper introduces new semi-supervised learning methods that leverage multiple imputations and confidence thresholds to improve accuracy and reduce bias in datasets with non-random missing labels, especially under MNAR conditions.

Contribution

It proposes two novel methods combining multiple imputation models and confidence filtering, including SSL-DI, to address bias and improve performance in non-random missing label scenarios.

Findings

01

Outperforms existing methods in classification accuracy

02

Effectively reduces bias in MNAR and MCAR situations

03

Improves reliability of pseudo-labels through confidence thresholds

Abstract

Semi-Supervised Learning (SSL) is implemented when algorithms are trained on both labeled and unlabeled data. This is a very common application of ML as it is unrealistic to obtain a fully labeled dataset. Researchers have tackled three main issues: missing at random (MAR), missing completely at random (MCAR), and missing not at random (MNAR). The MNAR problem is the most challenging of the three as one cannot safely assume that all class distributions are equal. Existing methods, including Class-Aware Imputation (CAI) and Class-Aware Propensity (CAP), mostly overlook the non-randomness in the unlabeled data. This paper proposes two new methods of combining multiple imputation models to achieve higher accuracy and less bias. 1) We use multiple imputation models, create confidence intervals, and apply a threshold to ignore pseudo-labels with low confidence. 2) Our new method, SSL with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Face and Expression Recognition · Domain Adaptation and Few-Shot Learning