Target-Independent Active Learning via Distribution-Splitting
Xiaofeng Cao, Ivor W. Tsang, Xiaofeng Xu, Guandong Xu

TL;DR
This paper introduces a target-independent active learning method that uses distribution-splitting based on number density, reducing label complexity and initial hypothesis dependence, with theoretical guarantees and practical effectiveness.
Contribution
It proposes a novel distribution-splitting strategy using number density, breaking the initial hypothesis dependence and providing theoretical guarantees for active learning.
Findings
The method reduces label complexity similarly to volume-splitting.
It breaks the curse of initial hypothesis dependence.
Experiments show improved halving and querying abilities.
Abstract
To reduce the label complexity in Agnostic Active Learning (A^2 algorithm), volume-splitting splits the hypothesis edges to reduce the Vapnik-Chervonenkis (VC) dimension in version space. However, the effectiveness of volume-splitting critically depends on the initial hypothesis and this problem is also known as target-dependent label complexity gap. This paper attempts to minimize this gap by introducing a novel notion of number density which provides a more natural and direct way to describe the hypothesis distribution than volume. By discovering the connections between hypothesis and input distribution, we map the volume of version space into the number density and propose a target-independent distribution-splitting strategy with the following advantages: 1) provide theoretical guarantees on reducing label complexity and error rate as volume-splitting; 2) break the curse of initial…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Algorithms and Data Compression · Machine Learning and Data Classification
