Microtask crowdsourcing for disease mention annotation in PubMed   abstracts

Benjamin M Good; Max Nanis; Andrew I. Su

arXiv:1408.1928·cs.CL·August 11, 2014

Microtask crowdsourcing for disease mention annotation in PubMed abstracts

Benjamin M Good, Max Nanis, Andrew I. Su

PDF

Open Access

TL;DR

This study demonstrates that microtask crowdsourcing via Amazon Mechanical Turk can efficiently produce high-quality disease mention annotations in biomedical literature, matching expert standards and enabling scalable corpus creation.

Contribution

The paper introduces a refined crowdsourcing protocol that achieves high annotation accuracy for disease mentions in PubMed abstracts, validated against a gold standard corpus.

Findings

01

Achieved an F measure of 0.872 against the gold standard.

02

Annotations can be tuned for higher precision or recall.

03

Cost-effective annotation at $0.06 per abstract per worker.

Abstract

Identifying concepts and relationships in biomedical text enables knowledge to be applied in computational analyses. Many biological natural language process (BioNLP) projects attempt to address this challenge, but the state of the art in BioNLP still leaves much room for improvement. Progress in BioNLP research depends on large, annotated corpora for evaluating information extraction systems and training machine learning models. Traditionally, such corpora are created by small numbers of expert annotators often working over extended periods of time. Recent studies have shown that workers on microtask crowdsourcing platforms such as Amazon's Mechanical Turk (AMT) can, in aggregate, generate high-quality annotations of biomedical text. Here, we investigated the use of the AMT in capturing disease mentions in PubMed abstracts. We used the NCBI Disease corpus as a gold standard for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMobile Crowdsensing and Crowdsourcing · Biomedical Text Mining and Ontologies · Topic Modeling