Support Vector Machine Active Learning Algorithms with Query-by-Committee versus Closest-to-Hyperplane Selection
Michael Bloodgood

TL;DR
This paper compares support vector machine active learning algorithms, specifically query-by-committee and closest-to-hyperplane methods, for imbalanced datasets, showing that ClosestPA generally outperforms other approaches in text classification tasks.
Contribution
The paper introduces three algorithms (ClosestPA, QBagPA, QBoostPA) that combine active learning with imbalance handling, and demonstrates the superior performance of ClosestPA across multiple datasets.
Findings
ClosestPA consistently outperforms QBagPA and QBoostPA.
Incorporating imbalance handling improves active learning effectiveness.
Insights explain why ClosestPA is more effective in various scenarios.
Abstract
This paper investigates and evaluates support vector machine active learning algorithms for use with imbalanced datasets, which commonly arise in many applications such as information extraction applications. Algorithms based on closest-to-hyperplane selection and query-by-committee selection are combined with methods for addressing imbalance such as positive amplification based on prevalence statistics from initial random samples. Three algorithms (ClosestPA, QBagPA, and QBoostPA) are presented and carefully evaluated on datasets for text classification and relation extraction. The ClosestPA algorithm is shown to consistently outperform the other two in a variety of ways and insights are provided as to why this is the case.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
