Semi-Supervised Learning with Multiple Imputations on Non-Random Missing Labels
Jason Lu, Michael Ma, Huaze Xu, Zixi Xu

TL;DR
This paper introduces new semi-supervised learning methods that leverage multiple imputations and confidence thresholds to improve accuracy and reduce bias in datasets with non-random missing labels, especially under MNAR conditions.
Contribution
It proposes two novel methods combining multiple imputation models and confidence filtering, including SSL-DI, to address bias and improve performance in non-random missing label scenarios.
Findings
Outperforms existing methods in classification accuracy
Effectively reduces bias in MNAR and MCAR situations
Improves reliability of pseudo-labels through confidence thresholds
Abstract
Semi-Supervised Learning (SSL) is implemented when algorithms are trained on both labeled and unlabeled data. This is a very common application of ML as it is unrealistic to obtain a fully labeled dataset. Researchers have tackled three main issues: missing at random (MAR), missing completely at random (MCAR), and missing not at random (MNAR). The MNAR problem is the most challenging of the three as one cannot safely assume that all class distributions are equal. Existing methods, including Class-Aware Imputation (CAI) and Class-Aware Propensity (CAP), mostly overlook the non-randomness in the unlabeled data. This paper proposes two new methods of combining multiple imputation models to achieve higher accuracy and less bias. 1) We use multiple imputation models, create confidence intervals, and apply a threshold to ignore pseudo-labels with low confidence. 2) Our new method, SSL with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Face and Expression Recognition · Domain Adaptation and Few-Shot Learning
