Margin-based sampling in high dimensions: When being active is less efficient than staying passive
Alexandru Tifrea, Jacob Clarysse, Fanny Yang

TL;DR
This paper demonstrates that in high-dimensional settings, passive learning can outperform margin-based active learning, especially when class separation is small, challenging common beliefs about active learning's superiority.
Contribution
The paper provides a theoretical proof that passive learning can be more effective than margin-based active learning in high dimensions, supported by extensive experiments.
Findings
Passive learning outperforms margin-based active learning in high dimensions.
Performance gap widens as class separation decreases.
Empirical results across diverse datasets confirm theoretical insights.
Abstract
It is widely believed that given the same labeling budget, active learning (AL) algorithms like margin-based active learning achieve better predictive performance than passive learning (PL), albeit at a higher computational cost. Recent empirical evidence suggests that this added cost might be in vain, as margin-based AL can sometimes perform even worse than PL. While existing works offer different explanations in the low-dimensional regime, this paper shows that the underlying mechanism is entirely different in high dimensions: we prove for logistic regression that PL outperforms margin-based AL even for noiseless data and when using the Bayes optimal decision boundary for sampling. Insights from our proof indicate that this high-dimensional phenomenon is exacerbated when the separation between the classes is small. We corroborate this intuition with experiments on 20 high-dimensional…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMachine Learning and Algorithms · Mineral Processing and Grinding · Machine Learning and Data Classification
MethodsLogistic Regression
