Low Budget Active Learning via Wasserstein Distance: An Integer   Programming Approach

Rafid Mahmood; Sanja Fidler; Marc T. Law

arXiv:2106.02968·cs.LG·March 8, 2023·1 cites

Low Budget Active Learning via Wasserstein Distance: An Integer Programming Approach

Rafid Mahmood, Sanja Fidler, Marc T. Law

PDF

Open Access

TL;DR

This paper presents a novel integer programming approach for active learning that minimizes Wasserstein distance to select representative data points, especially effective in low-label regimes.

Contribution

It introduces a tractable integer optimization model using Wasserstein distance and a Benders Decomposition algorithm for active data selection.

Findings

01

Outperforms baseline methods in low-budget scenarios

02

Uses high-quality latent features from unsupervised learning

03

Efficiently solves the core set selection problem

Abstract

Active learning is the process of training a model with limited labeled data by selecting a core subset of an unlabeled data pool to label. The large scale of data sets used in deep learning forces most sample selection strategies to employ efficient heuristics. This paper introduces an integer optimization problem for selecting a core set that minimizes the discrete Wasserstein distance from the unlabeled pool. We demonstrate that this problem can be tractably solved with a Generalized Benders Decomposition algorithm. Our strategy uses high-quality latent features that can be obtained by unsupervised learning on the unlabeled pool. Numerical results on several data sets show that our optimization approach is competitive with baselines and particularly outperforms them in the low budget regime where less than one percent of the data set is labeled.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms · Sparse and Compressive Sensing Techniques · Domain Adaptation and Few-Shot Learning