An Approach to Reducing Annotation Costs for BioNLP

Michael Bloodgood; K. Vijay-Shanker

arXiv:1409.3881·cs.CL·September 16, 2014

An Approach to Reducing Annotation Costs for BioNLP

Michael Bloodgood, K. Vijay-Shanker

PDF

Open Access

TL;DR

This paper presents a novel active learning algorithm, ClosestInitPA, tailored for BioNLP tasks with characteristics like data redundancy, costly annotation, SVM suitability, and class imbalance, to significantly reduce annotation efforts.

Contribution

The paper introduces ClosestInitPA, an active learning algorithm specifically designed for BioNLP tasks with certain data characteristics, improving annotation efficiency.

Findings

01

Effective in reducing annotation costs for BioNLP tasks

02

Works well with imbalanced datasets and SVM classifiers

03

Applicable to tasks with redundant training data

Abstract

There is a broad range of BioNLP tasks for which active learning (AL) can significantly reduce annotation costs and a specific AL algorithm we have developed is particularly effective in reducing annotation costs for these tasks. We have previously developed an AL algorithm called ClosestInitPA that works best with tasks that have the following characteristics: redundancy in training material, burdensome annotation costs, Support Vector Machines (SVMs) work well for the task, and imbalanced datasets (i.e. when set up as a binary classification problem, one class is substantially rarer than the other). Many BioNLP tasks have these characteristics and thus our AL algorithm is a natural approach to apply to BioNLP tasks.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms