Bounds on the Generalization Error in Active Learning
Vincent Menden, Yahya Saleh, Armin Iske

TL;DR
This paper derives upper bounds on the generalization error in active learning, linking query strategies and hypothesis complexity, and provides a foundation for designing and evaluating active learning algorithms.
Contribution
It introduces empirical risk minimization bounds for active learning, connecting informativeness and representativeness strategies, and relates these bounds to hypothesis class regularization.
Findings
Bounds align with empirical observations.
Combining informativeness and representativeness improves query algorithms.
Regularization ensures the bounds' validity.
Abstract
We establish empirical risk minimization principles for active learning by deriving a family of upper bounds on the generalization error. Aligning with empirical observations, the bounds suggest that superior query algorithms can be obtained by combining both informativeness and representativeness query strategies, where the latter is assessed using integral probability metrics. To facilitate the use of these bounds in application, we systematically link diverse active learning scenarios, characterized by their loss functions and hypothesis classes to their corresponding upper bounds. Our results show that regularization techniques used to constraint the complexity of various hypothesis classes are sufficient conditions to ensure the validity of the bounds. The present work enables principled construction and empirical quality-evaluation of query algorithms in active learning.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Computability, Logic, AI Algorithms
