# Crowdsourcing Ground Truth for Medical Relation Extraction

**Authors:** Anca Dumitrache, Lora Aroyo, Chris Welty

arXiv: 1701.02185 · 2018-09-27

## TL;DR

This paper introduces the CrowdTruth method for collecting ambiguous ground truth data via crowdsourcing, improving medical relation extraction quality and cost-effectiveness by modeling annotator disagreement.

## Contribution

It presents a novel crowdsourcing approach that accounts for ambiguity, producing high-quality training data for medical relation extraction at scale.

## Key findings

- CrowdTruth achieves expert-level quality in medical relation annotation.
- Modeling ambiguity enhances training data effectiveness over distant supervision.
- Weighted evaluation measures better reflect true performance considering ambiguity.

## Abstract

Cognitive computing systems require human labeled data for evaluation, and often for training. The standard practice used in gathering this data minimizes disagreement between annotators, and we have found this results in data that fails to account for the ambiguity inherent in language. We have proposed the CrowdTruth method for collecting ground truth through crowdsourcing, that reconsiders the role of people in machine learning based on the observation that disagreement between annotators provides a useful signal for phenomena such as ambiguity in the text. We report on using this method to build an annotated data set for medical relation extraction for the $cause$ and $treat$ relations, and how this data performed in a supervised training experiment. We demonstrate that by modeling ambiguity, labeled data gathered from crowd workers can (1) reach the level of quality of domain experts for this task while reducing the cost, and (2) provide better training data at scale than distant supervision. We further propose and validate new weighted measures for precision, recall, and F-measure, that account for ambiguity in both human and machine performance on this task.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1701.02185/full.md

## Figures

15 figures with captions in the complete paper: https://tomesphere.com/paper/1701.02185/full.md

## References

32 references — full list in the complete paper: https://tomesphere.com/paper/1701.02185/full.md

---
Source: https://tomesphere.com/paper/1701.02185