Automatically Labeling Clinical Trial Outcomes: A Large-Scale Benchmark for Drug Development
Chufan Gao, Jathurshan Pradeepkumar, Trisha Das, Shivashankar Thati,, Jimeng Sun

TL;DR
This paper introduces the CTO benchmark, a large-scale, reproducible dataset of clinical trial outcomes integrating multiple data sources, and demonstrates its high accuracy and utility for drug development research.
Contribution
The paper presents the CTO benchmark, a comprehensive, manually annotated dataset of clinical trial outcomes, and evaluates its effectiveness with high agreement to expert labels.
Findings
F1 score of 94 for Phase 3 trial labels
Strong agreement between automated labels and expert annotations
Distribution shifts observed in recent trial data
Abstract
Background The cost of drug discovery and development is substantial, with clinical trial outcomes playing a critical role in regulatory approval and patient care. However, access to large-scale, high-quality clinical trial outcome data remains limited, hindering advancements in predictive modeling and evidence-based decision-making. Methods We present the Clinical Trial Outcome (CTO) benchmark, a fully reproducible, large-scale repository encompassing approximately 125,000 drug and biologics trials. CTO integrates large language model (LLM) interpretations of publications, trial phase progression tracking, sentiment analysis from news sources, stock price movements of trial sponsors, and additional trial-related metrics. Furthermore, we manually annotated a dataset of clinical trials conducted between 2020 and 2024 to enhance the quality and reliability of outcome labels. Results…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHealth Systems, Economic Evaluations, Quality of Life · Economic and Financial Impacts of Cancer
MethodsBalanced Selection · Sparse Evolutionary Training
