Single versus Multiple Annotation for Named Entity Recognition of Mutations
David Martinez Iraola, Antonio Jimeno Yepes

TL;DR
This paper investigates the effects of single versus multiple annotations in mutation NER, proposing sampling methods to improve data quality and evaluating their impact on classifier performance.
Contribution
It introduces sampling strategies for annotation quality improvement in mutation NER and assesses their effectiveness compared to traditional single annotation.
Findings
Multiple annotators reduce labeling errors.
Sampling methods improve dataset quality.
Enhanced annotation leads to better NER performance.
Abstract
The focus of this paper is to address the knowledge acquisition bottleneck for Named Entity Recognition (NER) of mutations, by analysing different approaches to build manually-annotated data. We address first the impact of using a single annotator vs two annotators, in order to measure whether multiple annotators are required. Once we evaluate the performance loss when using a single annotator, we apply different methods to sample the training data for second annotation, aiming at improving the quality of the dataset without requiring a full pass. We use held-out double-annotated data to build two scenarios with different types of rankings: similarity-based and confidence based. We evaluate both approaches on: (i) their ability to identify training instances that are erroneous (cases where single-annotator labels differ from double-annotation after discussion), and (ii) on Mutation NER…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Web Data Mining and Analysis
