Bad Students Make Great Teachers: Active Learning Accelerates   Large-Scale Visual Understanding

Talfan Evans; Shreya Pathak; Hamza Merzic; Jonathan Schwarz; Ryutaro; Tanno; Olivier J. Henaff

arXiv:2312.05328·cs.AI·October 17, 2024·1 cites

Bad Students Make Great Teachers: Active Learning Accelerates Large-Scale Visual Understanding

Talfan Evans, Shreya Pathak, Hamza Merzic, Jonathan Schwarz, Ryutaro, Tanno, Olivier J. Henaff

PDF

Open Access

TL;DR

This paper introduces a scalable active learning method using proxy models to prioritize data, significantly reducing training updates and computational costs for large-scale visual models while maintaining performance.

Contribution

The authors propose a novel active learning approach that generalizes across models and tasks, scales to large datasets, and reduces overall FLOP costs by using proxy models for data prioritization.

Findings

01

Achieves 46-51% fewer training updates

02

Reduces total computation by up to 25%

03

Sets new state-of-the-art in multimodal transfer tasks

Abstract

Power-law scaling indicates that large-scale training with uniform sampling is prohibitively slow. Active learning methods aim to increase data efficiency by prioritizing learning on the most relevant examples. Despite their appeal, these methods have yet to be widely adopted since no one algorithm has been shown to a) generalize across models and tasks b) scale to large datasets and c) yield overall FLOP savings when accounting for the overhead of data selection. In this work we propose a method which satisfies these three properties, leveraging small, cheap proxy models to estimate "learnability" scores for datapoints, which are used to prioritize data for the training of much larger models. As a result, our models require 46% and 51% fewer training updates and up to 25% less total computation to reach the same performance as uniformly trained visual classifiers on JFT and multimodal…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques

MethodsALIGN