Committee-Based Sample Selection for Probabilistic Classifiers
S. Argamon-Engelson, I. Dagan

TL;DR
This paper introduces a committee-based sample selection method for probabilistic classifiers that reduces annotation costs by selecting the most informative examples based on model disagreement, demonstrated on NLP tagging tasks.
Contribution
It extends query-by-committee methods to probabilistic models, proposing empirical strategies for sample selection that effectively reduce labeling effort.
Findings
Significant reduction in annotation costs achieved
Simple two-member committee performs well
Sample selection reduces model complexity
Abstract
In many real-world learning tasks, it is expensive to acquire a sufficient number of labeled examples for training. This paper investigates methods for reducing annotation cost by `sample selection'. In this approach, during training the learning program examines many unlabeled examples and selects for labeling only those that are most informative at each stage. This avoids redundantly labeling examples that contribute little new information. Our work follows on previous research on Query By Committee, extending the committee-based paradigm to the context of probabilistic classification. We describe a family of empirical methods for committee-based sample selection in probabilistic classification models, which evaluate the informativeness of an example by measuring the degree of disagreement between several model variants. These variants (the committee) are drawn randomly from a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
