BASIL: Balanced Active Semi-supervised Learning for Class Imbalanced Datasets
Suraj Kothawade, Pavan Kumar Reddy, Ganesh Ramakrishnan, Rishabh Iyer

TL;DR
BASIL is a novel active semi-supervised learning algorithm that effectively balances class distribution during data selection, improving model fairness and accuracy on imbalanced datasets.
Contribution
It introduces a submodular mutual information based approach for balanced data selection in SSL, enhancing performance across various methods.
Findings
BASIL outperforms existing active learning methods in class imbalance scenarios.
The approach improves SSL model fairness and accuracy.
Effective on medical imaging datasets like Path-MNIST and Organ-MNIST.
Abstract
Current semi-supervised learning (SSL) methods assume a balance between the number of data points available for each class in both the labeled and the unlabeled data sets. However, there naturally exists a class imbalance in most real-world datasets. It is known that training models on such imbalanced datasets leads to biased models, which in turn lead to biased predictions towards the more frequent classes. This issue is further pronounced in SSL methods, as they would use this biased model to obtain psuedo-labels (on the unlabeled data) during training. In this paper, we tackle this problem by attempting to select a balanced labeled dataset for SSL that would result in an unbiased model. Unfortunately, acquiring a balanced labeled dataset from a class imbalanced distribution in one shot is challenging. We propose BASIL (Balanced Active Semi-supervIsed Learning), a novel algorithm that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImbalanced Data Classification Techniques · Machine Learning and Algorithms · COVID-19 diagnosis using AI
