An Approach to Reducing Annotation Costs for BioNLP
Michael Bloodgood, K. Vijay-Shanker

TL;DR
This paper presents a novel active learning algorithm, ClosestInitPA, tailored for BioNLP tasks with characteristics like data redundancy, costly annotation, SVM suitability, and class imbalance, to significantly reduce annotation efforts.
Contribution
The paper introduces ClosestInitPA, an active learning algorithm specifically designed for BioNLP tasks with certain data characteristics, improving annotation efficiency.
Findings
Effective in reducing annotation costs for BioNLP tasks
Works well with imbalanced datasets and SVM classifiers
Applicable to tasks with redundant training data
Abstract
There is a broad range of BioNLP tasks for which active learning (AL) can significantly reduce annotation costs and a specific AL algorithm we have developed is particularly effective in reducing annotation costs for these tasks. We have previously developed an AL algorithm called ClosestInitPA that works best with tasks that have the following characteristics: redundancy in training material, burdensome annotation costs, Support Vector Machines (SVMs) work well for the task, and imbalanced datasets (i.e. when set up as a binary classification problem, one class is substantially rarer than the other). Many BioNLP tasks have these characteristics and thus our AL algorithm is a natural approach to apply to BioNLP tasks.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms
