# Enhancing cancer drug discovery: QSAR modeling with machine learning and chemical representations

**Authors:** Raúl Acosta-Murillo, José Carlos Ortiz-Bayliss, Patricio Adrian Zapata-Morin

PMC · DOI: 10.1371/journal.pone.0343654 · PLOS One · 2026-03-17

## TL;DR

This study explores how different chemical representations and machine learning models can improve predictions of cancer drug effectiveness.

## Contribution

The paper evaluates multiple chemical representations and machine learning algorithms for cancer bioactivity prediction.

## Key findings

- AVN with SVR achieved the highest predictive accuracy (R2 of 0.735) for the FGFR1 dataset.
- The mTOR dataset showed the best average performance across models with an R2 of 0.592.
- Cheminformatics tools like QSAR modeling can enhance cancer drug discovery.

## Abstract

Accurately predicting the bioactivity of small molecules against cancer therapeutic targets remains a significant challenge at the intersection of cheminformatics and drug discovery. This study comprehensively evaluates chemical representations, including AtomPair Counts (APC),Avalon (AVN), Extended-Connectivity Fingerprint diameter 4 (ECFP4), Extended-Connectivity Fingerprint diameter 6 (ECFP6), Feature-based Morgan 2 (FM2), Feature-based Morgan 3 (FM3), Mol2Vec (M2V), Molecular ACCess System (MACCS), Mordred 2D Chi Kappa (MK2), RDKFingerprint (RDF), Rdkit PhysChem (RDC), Torsion (TSN) combined with machine learning algorithms (Bayesian Ridge (BRG), Elastic Net (ENT), Extra Trees (ETT), Hist Gradient Boosting (HGT), K-Nearest Neighbors (kNN), Lasso (LSS), Multi-layer Perceptron (MLP), Partial least squares (PLS), Random Forest (RFT), Ridge (RDG), Support Vector Regressor (SVR), and XGBoost (XGB)) for predicting cancer bioactivities. The results show that while AVN chemical representation, in conjunction with SVR algorithm, achieved the highest predictive accuracy, with R2 of 0.735 in FGFR1 dataset; The mTOR dataset demonstrated the highest average performance across all models and chemical representations, with an R2 of 0.592 across various cancer datasets. These findings demonstrate how cheminformatics tools like molecular fingerprints and quantitative structure-activity relationship (QSAR) modeling can significantly enhance bioactivity prediction, ultimately contributing to more efficient and targeted cancer drug discovery.

## Linked entities

- **Proteins:** FGFR1 (fibroblast growth factor receptor 1), MTOR (mechanistic target of rapamycin kinase)
- **Diseases:** cancer (MONDO:0004992)

## Full-text entities

- **Genes:** CHEK1 (checkpoint kinase 1) [NCBI Gene 1111] {aka CHK1, OZEMA21}, ERBB2 (erb-b2 receptor tyrosine kinase 2) [NCBI Gene 2064] {aka CD340, HER-2, HER-2/neu, HER2, MLN 19, MLN-19}, PIK3CB (phosphatidylinositol-4,5-bisphosphate 3-kinase catalytic subunit beta) [NCBI Gene 5291] {aka P110BETA, PI3K, PI3KBETA, PIK3C1}, COL11A2 (collagen type XI alpha 2 chain) [NCBI Gene 1302] {aka DFNA13, DFNB53, FBCG2, HKE5, OSMEDA, OSMEDB}, FGFR1 (fibroblast growth factor receptor 1) [NCBI Gene 2260] {aka BFGFR, CD331, CEK, ECCL, FGFBR, FGFR-1}, APC (APC regulator of Wnt signaling pathway) [NCBI Gene 324] {aka BTPS2, DESMD, DP2, DP2.5, DP3, GS}, MTOR (mechanistic target of rapamycin kinase) [NCBI Gene 2475] {aka FRAP, FRAP1, FRAP2, RAFT1, RAPT1, SKS}, FGFR3 (fibroblast growth factor receptor 3) [NCBI Gene 2261] {aka ACH, CD333, CEK2, HSFGFR3EX, JTK4}
- **Diseases:** toxicity (MESH:D064420), metastasis (MESH:D009362), breast, prostate, and lung cancer (MESH:D001943), inflammatory (MESH:D007249), Cancer (MESH:D009369)
- **Chemicals:** AVN (-), salts (MESH:D012492)
- **Mutations:** M2V, 2D, R2

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12994839/full.md

## Figures

25 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12994839/full.md

## References

87 references — full list in the complete paper: https://tomesphere.com/paper/PMC12994839/full.md

---
Source: https://tomesphere.com/paper/PMC12994839