Statistical NLP for Optimization of Clinical Trial Success Prediction in Pharmaceutical R&D
Michael R. Doane

TL;DR
This paper develops NLP-based models, including a BioBERT-based classifier, to predict clinical trial success in neuroscience, significantly improving prediction accuracy and reducing error compared to traditional methods.
Contribution
It introduces a novel NLP-enabled probabilistic classifier using domain-specific language models for clinical trial success prediction in neuroscience.
Findings
BioBERT model achieved ROC-AUC of 0.74, outperforming traditional models.
NLP features improved prediction accuracy and calibration.
Predictions were 70% better than industry benchmarks.
Abstract
This work presents the development and evaluation of an NLP-enabled probabilistic classifier designed to estimate the probability of technical and regulatory success (pTRS) for clinical trials in the field of neuroscience. While pharmaceutical R&D is plagued by high attrition rates and enormous costs, particularly within neuroscience, where success rates are below 10%, timely identification of promising programs can streamline resource allocation and reduce financial risk. Leveraging data from the ClinicalTrials.gov database and success labels from the recently developed Clinical Trial Outcome dataset, the classifier extracts text-based clinical trial features using statistical NLP techniques. These features were integrated into several non-LLM frameworks (logistic regression, gradient boosting, and random forest) to generate calibrated probability scores. Model performance was assessed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Healthcare · Meta-analysis and systematic reviews · Biomedical Text Mining and Ontologies
