Active Learning from the Web

Ryoma Sato

arXiv:2210.08205·cs.LG·February 13, 2023

Active Learning from the Web

Ryoma Sato

PDF

Open Access 1 Repo

TL;DR

This paper proposes Seafaring, an efficient method to perform active learning from the vast, unstructured web data pool, demonstrated on a large-scale image dataset, improving label efficiency over traditional small pools.

Contribution

It introduces Seafaring, a user-side retrieval algorithm for active learning from extremely large web data pools without task-specific pool construction.

Findings

01

Seafaring outperforms existing methods on large-scale web data.

02

The method effectively retrieves informative data from over ten billion images.

03

Active learning from web data reduces labeling costs significantly.

Abstract

Labeling data is one of the most costly processes in machine learning pipelines. Active learning is a standard approach to alleviating this problem. Pool-based active learning first builds a pool of unlabelled data and iteratively selects data to be labeled so that the total number of required labels is minimized, keeping the model performance high. Many effective criteria for choosing data from the pool have been proposed in the literature. However, how to build the pool is less explored. Specifically, most of the methods assume that a task-specific pool is given for free. In this paper, we advocate that such a task-specific pool is not always available and propose the use of a myriad of unlabelled data on the Web for the pool for which active learning is applied. As the pool is extremely large, it is likely that relevant data exist in the pool for many tasks, and we do not need to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

joisino/seafaring
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms · Machine Learning and Data Classification · Advanced Bandit Algorithms Research