Analyzing the Granularity and Cost of Annotation in Clinical Sequence   Labeling

Haozhan Sun; Chenchen Xu; Hanna Suominen

arXiv:2108.09913·cs.CL·August 24, 2021·1 cites

Analyzing the Granularity and Cost of Annotation in Clinical Sequence Labeling

Haozhan Sun, Chenchen Xu, Hanna Suominen

PDF

Open Access

TL;DR

This study investigates how annotation granularity affects machine learning performance in clinical sequence labeling, suggesting that less detailed annotations may be more cost-effective without sacrificing accuracy.

Contribution

The paper provides an analysis of annotation granularity's impact on ML performance in clinical data and offers guidelines to optimize annotation efforts for cost-effectiveness.

Findings

01

Annotation granularity has limited impact on ML performance.

02

Adding manual annotations does not significantly improve results.

03

Less detailed annotations are recommended for cost efficiency.

Abstract

Well-annotated datasets, as shown in recent top studies, are becoming more important for researchers than ever before in supervised machine learning (ML). However, the dataset annotation process and its related human labor costs remain overlooked. In this work, we analyze the relationship between the annotation granularity and ML performance in sequence labeling, using clinical records from nursing shift-change handover. We first study a model derived from textual language features alone, without additional information based on nursing knowledge. We find that this sequence tagger performs well in most categories under this granularity. Then, we further include the additional manual annotations by a nurse, and find the sequence tagging performance remaining nearly the same. Finally, we give a guideline and reference to the community arguing it is not necessary and even not recommended to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Machine Learning in Healthcare