A Benchmark and Comparison of Active Learning for Logistic Regression
Yazhou Yang, Marco Loog

TL;DR
This paper benchmarks and compares various active learning methods for logistic regression across multiple datasets, revealing that simple methods like uncertainty sampling perform remarkably well and often outperform more complex approaches.
Contribution
It provides a comprehensive comparison of active learning techniques for logistic regression, highlighting their characteristics and practical performance on diverse datasets.
Findings
Uncertainty sampling performs exceptionally well overall.
Random sampling remains competitive and is not outperformed by complex methods in many cases.
Active learning methods vary in effectiveness depending on dataset and learning curve stage.
Abstract
Logistic regression is by far the most widely used classifier in real-world applications. In this paper, we benchmark the state-of-the-art active learning methods for logistic regression and discuss and illustrate their underlying characteristics. Experiments are carried out on three synthetic datasets and 44 real-world datasets, providing insight into the behaviors of these active learning methods with respect to the area of the learning curve (which plots classification accuracy as a function of the number of queried examples) and their computational costs. Surprisingly, one of the earliest and simplest suggested active learning methods, i.e., uncertainty sampling, performs exceptionally well overall. Another remarkable finding is that random sampling, which is the rudimentary baseline to improve upon, is not overwhelmed by individual active learning techniques in many cases.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsLogistic Regression
