Bounds on the Generalization Error in Active Learning

Vincent Menden; Yahya Saleh; Armin Iske

arXiv:2409.09078·stat.ML·September 17, 2024

Bounds on the Generalization Error in Active Learning

Vincent Menden, Yahya Saleh, Armin Iske

PDF

Open Access

TL;DR

This paper derives upper bounds on the generalization error in active learning, linking query strategies and hypothesis complexity, and provides a foundation for designing and evaluating active learning algorithms.

Contribution

It introduces empirical risk minimization bounds for active learning, connecting informativeness and representativeness strategies, and relates these bounds to hypothesis class regularization.

Findings

01

Bounds align with empirical observations.

02

Combining informativeness and representativeness improves query algorithms.

03

Regularization ensures the bounds' validity.

Abstract

We establish empirical risk minimization principles for active learning by deriving a family of upper bounds on the generalization error. Aligning with empirical observations, the bounds suggest that superior query algorithms can be obtained by combining both informativeness and representativeness query strategies, where the latter is assessed using integral probability metrics. To facilitate the use of these bounds in application, we systematically link diverse active learning scenarios, characterized by their loss functions and hypothesis classes to their corresponding upper bounds. Our results show that regularization techniques used to constraint the complexity of various hypothesis classes are sufficient conditions to ensure the validity of the bounds. The present work enables principled construction and empirical quality-evaluation of query algorithms in active learning.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms · Computability, Logic, AI Algorithms