The Cost of Replicability in Active Learning
Rupkatha Hira, Dominik Kau, Jessica Sorrell

TL;DR
This paper explores the additional label complexity incurred when enforcing replicability in active learning, proposing new algorithms that balance efficiency with stability.
Contribution
It introduces two replicable active learning algorithms for realizable and agnostic settings, analyzing their label complexity under the replicability constraint.
Findings
Replicability increases label complexity in active learning.
CAL and A^2 algorithms remain efficient despite the replicability constraint.
Proposed algorithms achieve substantial label savings while ensuring replicability.
Abstract
Active learning aims to reduce the number of labeled data points required by machine learning algorithms by selectively querying labels from initially unlabeled data. Ensuring replicability, where an algorithm produces consistent outcomes across different runs, is essential for the reliability of machine learning models but often increases sample complexity. This paper investigates the cost of replicability in active learning using two classical disagreement-based methods: the CAL and A^2 algorithms. Leveraging randomized thresholding techniques, we propose two replicable active learning algorithms: one for realizable learning of finite hypothesis classes and another for the agnostic setting. Our theoretical analysis shows that while enforcing replicability increases label complexity, CAL and A^2 still achieve substantial label savings under this constraint. These findings provide…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms
