Automatically Labeling Clinical Trial Outcomes: A Large-Scale Benchmark   for Drug Development

Chufan Gao; Jathurshan Pradeepkumar; Trisha Das; Shivashankar Thati,; Jimeng Sun

arXiv:2406.10292·cs.AI·March 7, 2025

Automatically Labeling Clinical Trial Outcomes: A Large-Scale Benchmark for Drug Development

Chufan Gao, Jathurshan Pradeepkumar, Trisha Das, Shivashankar Thati,, Jimeng Sun

PDF

Open Access 1 Datasets

TL;DR

This paper introduces the CTO benchmark, a large-scale, reproducible dataset of clinical trial outcomes integrating multiple data sources, and demonstrates its high accuracy and utility for drug development research.

Contribution

The paper presents the CTO benchmark, a comprehensive, manually annotated dataset of clinical trial outcomes, and evaluates its effectiveness with high agreement to expert labels.

Findings

01

F1 score of 94 for Phase 3 trial labels

02

Strong agreement between automated labels and expert annotations

03

Distribution shifts observed in recent trial data

Abstract

Background The cost of drug discovery and development is substantial, with clinical trial outcomes playing a critical role in regulatory approval and patient care. However, access to large-scale, high-quality clinical trial outcome data remains limited, hindering advancements in predictive modeling and evidence-based decision-making. Methods We present the Clinical Trial Outcome (CTO) benchmark, a fully reproducible, large-scale repository encompassing approximately 125,000 drug and biologics trials. CTO integrates large language model (LLM) interpretations of publications, trial phase progression tracking, sentiment analysis from news sources, stock price movements of trial sponsors, and additional trial-related metrics. Furthermore, we manually annotated a dataset of clinical trials conducted between 2020 and 2024 to enhance the quality and reliability of outcome labels. Results…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

chufangao/CTO
dataset· 464 dl
464 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHealth Systems, Economic Evaluations, Quality of Life · Economic and Financial Impacts of Cancer

MethodsBalanced Selection · Sparse Evolutionary Training