# Prodrug-ML: prodrug-likeness prediction via machine learning on sampled negative decoys

**Authors:** Sadettin Y. Ugurlu, Shan He

PMC · DOI: 10.1007/s10822-025-00725-x · Journal of Computer-Aided Molecular Design · 2026-01-10

## TL;DR

Prodrug-ML is a machine learning tool that helps identify promising prodrug candidates by efficiently screening chemical structures and reducing the need for expensive lab experiments.

## Contribution

Prodrug-ML introduces a novel machine learning framework with reliable negative examples and cross-decoy validation to improve prodrug screening efficiency.

## Key findings

- Prodrug-ML achieves high early retrieval performance with EF@1% ≈ 6–8 and EF@5% ≈ 5–6.
- The model demonstrates strong discrimination with BEDROC scores up to 0.99 and ROC AUC ≈ 0.86–0.87.
- Using Prodrug-ML can reduce experimental time and cost by up to 97–98% by focusing on top-ranked candidates.

## Abstract

A prodrug is a pharmacologically inactive (or attenuated) derivative that undergoes bioreversible transformation in vivo to release an active parent drug, enabling temporary optimization of properties such as solubility, permeability, and targeting. Despite expanding catalogs of known prodrugs, in silico screening remains limited by the absence of reliable negative examples: training/evaluation sets often contain only positives or ad-hoc decoys, leading to class imbalance, property-mismatch shortcuts, and irreproducible benchmarks. Unfortunately, the limitation of reliable negatives has resulted in there being no efficient machine learning-based prodrug screening approach. Therefore, we introduce Prodrug-ML, an efficient machine learning-based screen for prodrug-likeness that prioritizes candidates rather than asserting mechanistic truth. Prodrug-ML helps medicinal chemists triage prodrugging ideas during hit-to-lead and lead optimization, filter enumerated libraries of promoiety–attachment variants before ADMET assays, and retrospectively mine internal/ChEMBL-like collections to surface likely prodrug chemotypes. In practice, users (i) generate or collect candidate structures (e.g., parent drug ± pro-moieties), (ii) score them with Prodrug-ML, and (iii) advance only high-scoring candidates to synthesis/assay, thereby reducing wet-lab load while maintaining chemical diversity. In order to achieve such practical usage, the Prodrug-ML framework, containing the default classifier, LightGBM, addresses these issues by (i) constructing three complementary, property-controlled negative cohorts (DUD-E–style near-misses, random ChEMBL, and strictly filtered ChEMBL), (ii) hardness control and label-noise guardrails on decoys, (iii) domain-bias control, and (iv) cross-decoy validation with multimodel feature selection. Produg-ML has been evaluated five times on hold-out data and an unseen test benchmark, after 80% of training data. In the benchmarks, the multimodel ensemble consistently improves early retrieval and overall discrimination, attaining \documentclass[12pt]{minimal}
				\usepackage{amsmath}
				\usepackage{wasysym} 
				\usepackage{amsfonts} 
				\usepackage{amssymb} 
				\usepackage{amsbsy}
				\usepackage{mathrsfs}
				\usepackage{upgreek}
				\setlength{\oddsidemargin}{-69pt}
				\begin{document}$$\textrm{EF}@1\%\approx 6\text {--}8$$\end{document}, \documentclass[12pt]{minimal}
				\usepackage{amsmath}
				\usepackage{wasysym} 
				\usepackage{amsfonts} 
				\usepackage{amssymb} 
				\usepackage{amsbsy}
				\usepackage{mathrsfs}
				\usepackage{upgreek}
				\setlength{\oddsidemargin}{-69pt}
				\begin{document}$$\textrm{EF}@5\%\approx 5\text {--}6$$\end{document}, \documentclass[12pt]{minimal}
				\usepackage{amsmath}
				\usepackage{wasysym} 
				\usepackage{amsfonts} 
				\usepackage{amssymb} 
				\usepackage{amsbsy}
				\usepackage{mathrsfs}
				\usepackage{upgreek}
				\setlength{\oddsidemargin}{-69pt}
				\begin{document}$$\textrm{BEDROC}_{20}\approx 0.78\text {--}0.82$$\end{document}, \documentclass[12pt]{minimal}
				\usepackage{amsmath}
				\usepackage{wasysym} 
				\usepackage{amsfonts} 
				\usepackage{amssymb} 
				\usepackage{amsbsy}
				\usepackage{mathrsfs}
				\usepackage{upgreek}
				\setlength{\oddsidemargin}{-69pt}
				\begin{document}$$\textrm{BEDROC}_{50}\approx 0.90\text {--}0.95$$\end{document}, and \documentclass[12pt]{minimal}
				\usepackage{amsmath}
				\usepackage{wasysym} 
				\usepackage{amsfonts} 
				\usepackage{amssymb} 
				\usepackage{amsbsy}
				\usepackage{mathrsfs}
				\usepackage{upgreek}
				\setlength{\oddsidemargin}{-69pt}
				\begin{document}$$\textrm{BEDROC}_{80}\approx 0.95\text {--}0.99$$\end{document}, alongside ROC AUC \documentclass[12pt]{minimal}
				\usepackage{amsmath}
				\usepackage{wasysym} 
				\usepackage{amsfonts} 
				\usepackage{amssymb} 
				\usepackage{amsbsy}
				\usepackage{mathrsfs}
				\usepackage{upgreek}
				\setlength{\oddsidemargin}{-69pt}
				\begin{document}$$\approx 0.86\text {--}0.87$$\end{document}, average precision \documentclass[12pt]{minimal}
				\usepackage{amsmath}
				\usepackage{wasysym} 
				\usepackage{amsfonts} 
				\usepackage{amssymb} 
				\usepackage{amsbsy}
				\usepackage{mathrsfs}
				\usepackage{upgreek}
				\setlength{\oddsidemargin}{-69pt}
				\begin{document}$$\approx 0.60\text {--}0.65$$\end{document}, and F1 \documentclass[12pt]{minimal}
				\usepackage{amsmath}
				\usepackage{wasysym} 
				\usepackage{amsfonts} 
				\usepackage{amssymb} 
				\usepackage{amsbsy}
				\usepackage{mathrsfs}
				\usepackage{upgreek}
				\setlength{\oddsidemargin}{-69pt}
				\begin{document}$$\approx 0.58\text {--}0.62$$\end{document}. As a result, these results, especially high BEDROC scores, are consistent with concentrating at least a prodrug within the top \documentclass[12pt]{minimal}
				\usepackage{amsmath}
				\usepackage{wasysym} 
				\usepackage{amsfonts} 
				\usepackage{amssymb} 
				\usepackage{amsbsy}
				\usepackage{mathrsfs}
				\usepackage{upgreek}
				\setlength{\oddsidemargin}{-69pt}
				\begin{document}$$\sim 2\text {--}3\%$$\end{document} of ranked candidates, implying \documentclass[12pt]{minimal}
				\usepackage{amsmath}
				\usepackage{wasysym} 
				\usepackage{amsfonts} 
				\usepackage{amssymb} 
				\usepackage{amsbsy}
				\usepackage{mathrsfs}
				\usepackage{upgreek}
				\setlength{\oddsidemargin}{-69pt}
				\begin{document}$$\sim 97\text {--}98\%$$\end{document} reductions in experimental time and cost when using standard wet-lab workflows that assay only the early tranche.

## Full-text entities

- **Genes:** SHROOM4 (shroom family member 4) [NCBI Gene 57477] {aka MRXSSDS, SHAP, shrm4}, MFSD11 (major facilitator superfamily domain containing 11) [NCBI Gene 79157] {aka ET}
- **Diseases:** toxicity (MESH:D064420), ML (MESH:C537366), hypoxia (MESH:D000860)
- **Chemicals:** forEF@1 (-), epoxides (MESH:D004852), isocyanates (MESH:D017953), peptide (MESH:D010455), carbamate (MESH:D002219), ZINC (MESH:D015032), prontosil (MESH:C003359), formaldehyde (MESH:D005557), catechol (MESH:C034221), hydroxamic acids (MESH:D006877), quinone (MESH:C004532), piperazine (MESH:D000077489), phosphonate (MESH:D063065), aziridines (MESH:D001388), oximes (MESH:D010091), amino acid (MESH:D000596), amide (MESH:D000577), phenol (MESH:D019800), piperidine (MESH:C032727), carbonate (MESH:D002254), aldehydes (MESH:D000447), alcohol (MESH:D000438), hydrogen (MESH:D006859), sulfonyl chloride (MESH:C044255), anhydrides (MESH:D000812), imines (MESH:D007097), chloroacetamide (MESH:C013874), vinyl sulfone (MESH:C009873), methenamine (MESH:D008709), sulfonamide (MESH:D013449), ester (MESH:D004952), amine (MESH:D000588), phosphate (MESH:D010710), sulfanilamide (MESH:D000077145), Si (MESH:D012825), acid (MESH:D000143), OH (MESH:C031356), O (MESH:D010100)
- **Species:** Homo sapiens (human, species) [taxon 9606]
- **Mutations:** C3C, C1)C, CC1)CC

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12790554/full.md

## Figures

24 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12790554/full.md

## References

1 references — full list in the complete paper: https://tomesphere.com/paper/PMC12790554/full.md

---
Source: https://tomesphere.com/paper/PMC12790554