Statistical NLP for Optimization of Clinical Trial Success Prediction in Pharmaceutical R&D

Michael R. Doane

arXiv:2512.00586·cs.LG·December 2, 2025

Statistical NLP for Optimization of Clinical Trial Success Prediction in Pharmaceutical R&D

Michael R. Doane

PDF

Open Access

TL;DR

This paper develops NLP-based models, including a BioBERT-based classifier, to predict clinical trial success in neuroscience, significantly improving prediction accuracy and reducing error compared to traditional methods.

Contribution

It introduces a novel NLP-enabled probabilistic classifier using domain-specific language models for clinical trial success prediction in neuroscience.

Findings

01

BioBERT model achieved ROC-AUC of 0.74, outperforming traditional models.

02

NLP features improved prediction accuracy and calibration.

03

Predictions were 70% better than industry benchmarks.

Abstract

This work presents the development and evaluation of an NLP-enabled probabilistic classifier designed to estimate the probability of technical and regulatory success (pTRS) for clinical trials in the field of neuroscience. While pharmaceutical R&D is plagued by high attrition rates and enormous costs, particularly within neuroscience, where success rates are below 10%, timely identification of promising programs can streamline resource allocation and reduce financial risk. Leveraging data from the ClinicalTrials.gov database and success labels from the recently developed Clinical Trial Outcome dataset, the classifier extracts text-based clinical trial features using statistical NLP techniques. These features were integrated into several non-LLM frameworks (logistic regression, gradient boosting, and random forest) to generate calibrated probability scores. Model performance was assessed…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Healthcare · Meta-analysis and systematic reviews · Biomedical Text Mining and Ontologies