The Leaf Clinical Trials Corpus: a new resource for query generation   from clinical trial eligibility criteria

Nicholas J Dobbins; Tony Mullen; Ozlem Uzuner; Meliha Yetisgen

arXiv:2207.13757·cs.CL·July 29, 2022

The Leaf Clinical Trials Corpus: a new resource for query generation from clinical trial eligibility criteria

Nicholas J Dobbins, Tony Mullen, Ozlem Uzuner, Meliha Yetisgen

PDF

1 Repo

TL;DR

This paper introduces the Leaf Clinical Trials corpus, a detailed, human-annotated dataset of clinical trial eligibility criteria designed to facilitate NLP-based query generation for patient recruitment.

Contribution

The paper presents a new, richly annotated corpus of clinical trial criteria, along with schema, annotation process, and baseline extraction results to support future NLP research.

Findings

01

The corpus contains over 1,000 annotated eligibility criteria.

02

Baseline information extraction results demonstrate the corpus's utility.

03

Detailed schema and annotation process ensure high-quality data.

Abstract

Identifying cohorts of patients based on eligibility criteria such as medical conditions, procedures, and medication use is critical to recruitment for clinical trials. Such criteria are often most naturally described in free-text, using language familiar to clinicians and researchers. In order to identify potential participants at scale, these criteria must first be translated into queries on clinical databases, which can be labor-intensive and error-prone. Natural language processing (NLP) methods offer a potential means of such conversion into database queries automatically. However they must first be trained and evaluated using corpora which capture clinical trials criteria in sufficient detail. In this paper, we introduce the Leaf Clinical Trials (LCT) corpus, a human-annotated corpus of over 1,000 clinical trial eligibility criteria descriptions using highly granular structured…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

uw-bionlp/clinical-trials-gov-data
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.