Learning From Less Data: Diversified Subset Selection and Active Learning in Image Classification Tasks
Vishal Kaushal, Anurag Sahoo, Khoshrav Doctor, Narasimha Raju, Suyash, Shetty, Pankaj Singh, Rishabh Iyer, Ganesh Ramakrishnan

TL;DR
This paper demonstrates that selecting diverse subsets of training data can improve image classification accuracy and reduce labeling efforts, enabling effective training with less data in computer vision tasks.
Contribution
It empirically validates the effectiveness of diversity-based subset selection models for reducing data requirements and labeling costs in image classification.
Findings
Diversity-based subset selection improves accuracy by 2-3%.
Effective with less training data, especially for CNNs.
Reduces human labeling efforts in active learning.
Abstract
Supervised machine learning based state-of-the-art computer vision techniques are in general data hungry and pose the challenges of not having adequate computing resources and of high costs involved in human labeling efforts. Training data subset selection and active learning techniques have been proposed as possible solutions to these challenges respectively. A special class of subset selection functions naturally model notions of diversity, coverage and representation and they can be used to eliminate redundancy and thus lend themselves well for training data subset selection. They can also help improve the efficiency of active learning in further reducing human labeling efforts by selecting a subset of the examples obtained using the conventional uncertainty sampling based techniques. In this work we empirically demonstrate the effectiveness of two diversity models, namely the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Machine Learning and Data Classification · Domain Adaptation and Few-Shot Learning
