A Study on Agreement in PICO Span Annotations
Grace E. Lee, Aixin Sun

TL;DR
This study investigates how consistently different human annotators identify PICO elements in medical texts, revealing significant boundary disagreements but broad agreement on general span areas, impacting annotation evaluation methods.
Contribution
It provides a detailed analysis of inter-annotator agreement for PICO span annotations, highlighting the need for combined agreement measures for better evaluation.
Findings
Annotators show high variability in span boundaries.
Broad agreement exists on general span areas.
Standard agreement measures may underestimate true agreement.
Abstract
In evidence-based medicine, relevance of medical literature is determined by predefined relevance conditions. The conditions are defined based on PICO elements, namely, Patient, Intervention, Comparator, and Outcome. Hence, PICO annotations in medical literature are essential for automatic relevant document filtering. However, defining boundaries of text spans for PICO elements is not straightforward. In this paper, we study the agreement of PICO annotations made by multiple human annotators, including both experts and non-experts. Agreements are estimated by a standard span agreement (i.e., matching both labels and boundaries of text spans), and two types of relaxed span agreement (i.e., matching labels without guaranteeing matching boundaries of spans). Based on the analysis, we report two observations: (i) Boundaries of PICO span annotations by individual human annotators are very…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Biomedical Text Mining and Ontologies
