SimPro: A Simple Probabilistic Framework Towards Realistic Long-Tailed Semi-Supervised Learning
Chaoqun Du, Yizeng Han, Gao Huang

TL;DR
SimPro introduces a flexible probabilistic framework for semi-supervised learning that effectively handles imbalanced and unknown class distributions without relying on rigid assumptions, achieving state-of-the-art results.
Contribution
The paper presents a novel, assumption-free probabilistic framework called SimPro that refines EM for better class distribution estimation and pseudo-labeling in semi-supervised learning.
Findings
Achieves state-of-the-art performance across diverse benchmarks.
Effectively handles imbalanced and mismatched class distributions.
Provides theoretical guarantees and is easy to implement.
Abstract
Recent advancements in semi-supervised learning have focused on a more realistic yet challenging task: addressing imbalances in labeled data while the class distribution of unlabeled data remains both unknown and potentially mismatched. Current approaches in this sphere often presuppose rigid assumptions regarding the class distribution of unlabeled data, thereby limiting the adaptability of models to only certain distribution ranges. In this study, we propose a novel approach, introducing a highly adaptable framework, designated as SimPro, which does not rely on any predefined assumptions about the distribution of unlabeled data. Our framework, grounded in a probabilistic model, innovatively refines the expectation-maximization (EM) algorithm by explicitly decoupling the modeling of conditional and marginal class distributions. This separation facilitates a closed-form solution for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification
