# Selective Cleaning Enhances Machine Learning Accuracy for Drug Repurposing: Multiscale Discovery of MDM2 Inhibitors

**Authors:** Mohammad Firdaus Akmal, Ming Wah Wong

PMC · DOI: 10.3390/molecules30142992 · 2025-07-16

## TL;DR

This study improves drug repurposing for cancer by using a selective cleaning algorithm to boost machine learning accuracy in identifying MDM2 inhibitors.

## Contribution

A selective cleaning algorithm is introduced to enhance machine learning accuracy in drug repurposing for MDM2 inhibition.

## Key findings

- Selective cleaning reduced RMSE by 21.6% and achieved R2 = 0.87 in predicting pIC50 values.
- Three clinically tested compounds were identified as promising MDM2 inhibitors with high predicted potency and binding affinity.
- Quantum mechanical and molecular dynamics simulations confirmed stable interactions of selected compounds with MDM2.

## Abstract

Cancer remains one of the most formidable challenges to human health; hence, developing effective treatments is critical for saving lives. An important strategy involves reactivating tumor suppressor genes, particularly p53, by targeting their negative regulator MDM2, which is essential in promoting cell cycle arrest and apoptosis. Leveraging a drug repurposing approach, we screened over 24,000 clinically tested molecules to identify new MDM2 inhibitors. A key innovation of this work is the development and application of a selective cleaning algorithm that systematically filters assay data to mitigate noise and inconsistencies inherent in large-scale bioactivity datasets. This approach significantly improved the predictive accuracy of our machine learning model for pIC50 values, reducing RMSE by 21.6% and achieving state-of-the-art performance (R2 = 0.87)—a substantial improvement over standard data preprocessing pipelines. The optimized model was integrated with structure-based virtual screening via molecular docking to prioritize repurposing candidate compounds. We identified two clinical CB1 antagonists, MePPEP and otenabant, and the statin drug atorvastatin as promising repurposing candidates based on their high predicted potency and binding affinity toward MDM2. Interactions with the related proteins MDM4 and BCL2 suggest these compounds may enhance p53 restoration through multi-target mechanisms. Quantum mechanical (ONIOM) optimizations and molecular dynamics simulations confirmed the stability and favorable interaction profiles of the selected protein–ligand complexes, resembling that of navtemadlin, a known MDM2 inhibitor. This multiscale, accuracy-boosted workflow introduces a novel data-curation strategy that substantially enhances AI model performance and enables efficient drug repurposing against challenging cancer targets.

## Linked entities

- **Genes:** TP53 (tumor protein p53) [NCBI Gene 7157], MDM2 (MDM2 proto-oncogene) [NCBI Gene 4193], MDM4 (MDM4 regulator of p53) [NCBI Gene 4194], BCL2 (BCL2 apoptosis regulator) [NCBI Gene 596]
- **Proteins:** MDM2 (MDM2 proto-oncogene), MDM4 (MDM4 regulator of p53), BCL2 (BCL2 apoptosis regulator)
- **Chemicals:** MePPEP (PubChem CID 25107855), otenabant (PubChem CID 10052040), atorvastatin (PubChem CID 60823), navtemadlin (PubChem CID 58573469)
- **Diseases:** cancer (MONDO:0004992)

## Full-text entities

- **Genes:** MDM4 (MDM4 regulator of p53) [NCBI Gene 4194] {aka BMFS6, HDMX, MDMX, MRP1}, BCL2 (BCL2 apoptosis regulator) [NCBI Gene 596] {aka Bcl-2, PPP1R50}, MDM2 (MDM2 proto-oncogene) [NCBI Gene 4193] {aka ACTFS, HDMX, LSKB, hdm2}, TP53 (tumor protein p53) [NCBI Gene 7157] {aka BCC7, BMFS5, LFS1, P53, TRP53}, CNR1 (cannabinoid receptor 1) [NCBI Gene 1268] {aka CANN6, CB-R, CB1, CB1A, CB1K5, CB1R}
- **Diseases:** Cancer (MESH:D009369)
- **Chemicals:** atorvastatin (MESH:D000069059), MePPEP (-), navtemadlin (MESH:C588087)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

14 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12299252/full.md

---
Source: https://tomesphere.com/paper/PMC12299252