Committee-Based Sample Selection for Probabilistic Classifiers

S. Argamon-Engelson; I. Dagan

arXiv:1106.0220·cs.AI·June 2, 2011

Committee-Based Sample Selection for Probabilistic Classifiers

S. Argamon-Engelson, I. Dagan

PDF

TL;DR

This paper introduces a committee-based sample selection method for probabilistic classifiers that reduces annotation costs by selecting the most informative examples based on model disagreement, demonstrated on NLP tagging tasks.

Contribution

It extends query-by-committee methods to probabilistic models, proposing empirical strategies for sample selection that effectively reduce labeling effort.

Findings

01

Significant reduction in annotation costs achieved

02

Simple two-member committee performs well

03

Sample selection reduces model complexity

Abstract

In many real-world learning tasks, it is expensive to acquire a sufficient number of labeled examples for training. This paper investigates methods for reducing annotation cost by `sample selection'. In this approach, during training the learning program examines many unlabeled examples and selects for labeling only those that are most informative at each stage. This avoids redundantly labeling examples that contribute little new information. Our work follows on previous research on Query By Committee, extending the committee-based paradigm to the context of probabilistic classification. We describe a family of empirical methods for committee-based sample selection in probabilistic classification models, which evaluate the informativeness of an example by measuring the degree of disagreement between several model variants. These variants (the committee) are drawn randomly from a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.