A Corpus with Multi-Level Annotations of Patients, Interventions and   Outcomes to Support Language Processing for Medical Literature

Benjamin Nye; Junyi Jessy Li; Roma Patel; Yinfei Yang; Iain J.; Marshall; Ani Nenkova; Byron C. Wallace

arXiv:1806.04185·cs.CL·June 13, 2018

A Corpus with Multi-Level Annotations of Patients, Interventions and Outcomes to Support Language Processing for Medical Literature

Benjamin Nye, Junyi Jessy Li, Roma Patel, Yinfei Yang, Iain J., Marshall, Ani Nenkova, Byron C. Wallace

PDF

2 Repos 1 Datasets

TL;DR

This paper introduces a richly annotated corpus of 5,000 medical abstracts with detailed PICO element annotations, supporting advanced NLP tasks for medical literature analysis and evidence-based medicine.

Contribution

It provides a large, multi-level annotated dataset of clinical trial abstracts with structured PICO elements and demonstrates its utility for NLP applications in medicine.

Findings

01

Annotated corpus of 5,000 abstracts with PICO elements

02

Annotations include granular medical vocabulary mappings

03

Supports development of NLP tools for medical literature

Abstract

We present a corpus of 5,000 richly annotated abstracts of medical articles describing clinical randomized controlled trials. Annotations include demarcations of text spans that describe the Patient population enrolled, the Interventions studied and to what they were Compared, and the Outcomes measured (the `PICO' elements). These spans are further annotated at a more granular level, e.g., individual interventions within them are marked and mapped onto a structured medical vocabulary. We acquired annotations from a diverse set of workers with varying levels of expertise and cost. We describe our data collection process and the corpus itself in detail. We then outline a set of challenging NLP tasks that would aid searching of the medical literature and the practice of evidence-based medicine.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Datasets

mitclinicalml/clinical-ie
dataset· 718 dl
718 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.