# Learning From Less Data: A Unified Data Subset Selection and Active   Learning Framework for Computer Vision

**Authors:** Vishal Kaushal, Rishabh Iyer, Suraj Kothawade, Rohan Mahadev, Khoshrav, Doctor, Ganesh Ramakrishnan

arXiv: 1901.01151 · 2019-01-07

## TL;DR

This paper presents a unified framework combining data subset selection and active learning to reduce labeling costs in computer vision, demonstrating that diversity-based subset selection improves accuracy with less data across various tasks.

## Contribution

It introduces the effective use of Facility-Location and Dispersion diversity models for training data subset selection and active learning in computer vision.

## Key findings

- Diversity-based subset selection improves accuracy by up to 10%.
- Reduces labeling effort while maintaining performance.
- Effective across multiple computer vision tasks.

## Abstract

Supervised machine learning based state-of-the-art computer vision techniques are in general data hungry. Their data curation poses the challenges of expensive human labeling, inadequate computing resources and larger experiment turn around times. Training data subset selection and active learning techniques have been proposed as possible solutions to these challenges. A special class of subset selection functions naturally model notions of diversity, coverage and representation and can be used to eliminate redundancy thus lending themselves well for training data subset selection. They can also help improve the efficiency of active learning in further reducing human labeling efforts by selecting a subset of the examples obtained using the conventional uncertainty sampling based techniques. In this work, we empirically demonstrate the effectiveness of two diversity models, namely the Facility-Location and Dispersion models for training-data subset selection and reducing labeling effort. We demonstrate this across the board for a variety of computer vision tasks including Gender Recognition, Face Recognition, Scene Recognition, Object Detection and Object Recognition. Our results show that diversity based subset selection done in the right way can increase the accuracy by upto 5 - 10% over existing baselines, particularly in settings in which less training data is available. This allows the training of complex machine learning models like Convolutional Neural Networks with much less training data and labeling costs while incurring minimal performance loss.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1901.01151/full.md

## Figures

12 figures with captions in the complete paper: https://tomesphere.com/paper/1901.01151/full.md

## References

69 references — full list in the complete paper: https://tomesphere.com/paper/1901.01151/full.md

---
Source: https://tomesphere.com/paper/1901.01151