TL;DR
DEUCE introduces a dual-diversity and uncertainty-aware framework for cold-start active learning in NLP, effectively selecting balanced and hard-to-classify instances to improve label efficiency and model performance.
Contribution
It proposes a novel dual-diversity enhancing and uncertainty-aware approach that leverages a pretrained language model and graph-based methods for better data selection in CSAL.
Findings
Outperforms existing methods on six NLP datasets.
Achieves more balanced and informative data selection.
Demonstrates efficiency in cold-start active learning scenarios.
Abstract
Cold-start active learning (CSAL) selects valuable instances from an unlabeled dataset for manual annotation. It provides high-quality data at a low annotation cost for label-scarce text classification. However, existing CSAL methods overlook weak classes and hard representative examples, resulting in biased learning. To address these issues, this paper proposes a novel dual-diversity enhancing and uncertainty-aware (DEUCE) framework for CSAL. Specifically, DEUCE leverages a pretrained language model (PLM) to efficiently extract textual representations, class predictions, and predictive uncertainty. Then, it constructs a Dual-Neighbor Graph (DNG) to combine information on both textual diversity and class diversity, ensuring a balanced data distribution. It further propagates uncertainty information via density-based clustering to select hard representative instances. DEUCE performs well…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
