Early Risk Stratification of Dosing Errors in Clinical Trials Using Machine Learning
F\'elicien H\^eche, Sohrab Ferdowsi, Anthony Yazdani, Sara Sansaloni-Pastor, Douglas Teodoro

TL;DR
This study presents a machine learning framework that predicts the risk of high dosing errors in clinical trials before they start, enabling proactive quality management and improved safety.
Contribution
It introduces a scalable, multimodal ML approach combining structured and textual data for early risk stratification of clinical trials.
Findings
Late-fusion model achieved AUC-ROC of 0.862.
Calibrated outputs enabled reliable risk categorization.
Simple multimodal integration improved performance.
Abstract
Objective: The objective of this study is to develop a machine learning (ML)-based framework for early risk stratification of clinical trials (CTs) according to their likelihood of exhibiting a high rate of dosing errors, using information available prior to trial initiation. Materials and Methods: We constructed a dataset from ClinicalTrials.gov comprising 42,112 CTs. Structured, semi-structured trial data, and unstructured protocol-related free-text data were extracted. CTs were assigned binary labels indicating elevated dosing error rate, derived from adverse event reports, MedDRA terminology, and Wilson confidence intervals. We evaluated an XGBoost model trained on structured features, a ClinicalModernBERT model using textual data, and a simple late-fusion model combining both modalities. Post-hoc probability calibration was applied to enable interpretable, trial-level risk…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods in Clinical Trials · Advanced Causal Inference Techniques · Artificial Intelligence in Healthcare and Education
