Diversity-Aware Batch Active Learning for Dependency Parsing
Tianze Shi, Adrian Benton, Igor Malioutov, Ozan \.Irsoy

TL;DR
This paper explores a diversity-aware batch active learning approach using determinantal point processes to improve dependency parser training efficiency, especially in early learning stages, by selecting diverse training samples.
Contribution
It introduces a novel diversity-aware batch active learning method with DPPs for dependency parsing, demonstrating improved performance over non-diverse strategies.
Findings
DPP-based sampling outperforms diversity-agnostic methods in early learning stages.
Diversity-aware sampling maintains robustness under corpus duplication.
Selected batches with DPPs lead to better parser performance with fewer annotations.
Abstract
While the predictive performance of modern statistical dependency parsers relies heavily on the availability of expensive expert-annotated treebank data, not all annotations contribute equally to the training of the parsers. In this paper, we attempt to reduce the number of labeled examples needed to train a strong dependency parser using batch active learning (AL). In particular, we investigate whether enforcing diversity in the sampled batches, using determinantal point processes (DPPs), can improve over their diversity-agnostic counterparts. Simulation experiments on an English newswire corpus show that selecting diverse batches with DPPs is superior to strong selection strategies that do not enforce batch diversity, especially during the initial stages of the learning process. Additionally, our diversityaware strategy is robust under a corpus duplication setting, where…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Topic Modeling · Natural Language Processing Techniques
