Single versus Multiple Annotation for Named Entity Recognition of   Mutations

David Martinez Iraola; Antonio Jimeno Yepes

arXiv:2101.07450·cs.CL·January 20, 2021·1 cites

Single versus Multiple Annotation for Named Entity Recognition of Mutations

David Martinez Iraola, Antonio Jimeno Yepes

PDF

Open Access 1 Repo

TL;DR

This paper investigates the effects of single versus multiple annotations in mutation NER, proposing sampling methods to improve data quality and evaluating their impact on classifier performance.

Contribution

It introduces sampling strategies for annotation quality improvement in mutation NER and assesses their effectiveness compared to traditional single annotation.

Findings

01

Multiple annotators reduce labeling errors.

02

Sampling methods improve dataset quality.

03

Enhanced annotation leads to better NER performance.

Abstract

The focus of this paper is to address the knowledge acquisition bottleneck for Named Entity Recognition (NER) of mutations, by analysing different approaches to build manually-annotated data. We address first the impact of using a single annotator vs two annotators, in order to measure whether multiple annotators are required. Once we evaluate the performance loss when using a single annotator, we apply different methods to sample the training data for second annotation, aiming at improving the quality of the dataset without requiring a full pass. We use held-out double-annotated data to build two scenarios with different types of rankings: similarity-based and confidence based. We evaluate both approaches on: (i) their ability to identify training instances that are erroneous (cases where single-annotator labels differ from double-annotation after discussion), and (ii) on Mutation NER…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

rishabgit/genomic-info-from-papers
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Web Data Mining and Analysis