# Anticancer drug synergy prediction based on CatBoost

**Authors:** Changheng Li, Nana Guan, Hongyi Zhang

PMC · DOI: 10.7717/peerj-cs.2829 · 2025-05-19

## TL;DR

This paper introduces a machine learning model using CatBoost to predict effective anticancer drug combinations, achieving strong performance and identifying key genes involved in drug synergy.

## Contribution

A novel CatBoost-based model for predicting anticancer drug synergy with improved performance and biological insights via SHAP analysis.

## Key findings

- The model achieved a ROC AUC of 0.9217 and outperformed three other advanced models in predicting drug synergy.
- Drug features were more influential than cell line features in predicting synergy, according to SHAP analysis.
- Genes like PTK2, CCND1, and GNA11 were identified as important in drug synergy prediction.

## Abstract

The research of cancer treatments has always been a hot topic in the medical field. Multi-targeted combination drugs have been considered as an ideal option for cancer treatment. Since it is not feasible to use clinical experience or high-throughput screening to identify the complete combinatorial space, methods such as machine learning models offer the possibility to explore the combinatorial space effectively.

In this work, we proposed a machine learning method based on CatBoost to predict the synergy scores of anticancer drug combinations on cancer cell lines, which utilized oblivious trees and ordered boosting technique to avoid overfitting and bias. The model was trained and tested using the data screened from NCI-ALMANAC dataset. The drugs were characterized with morgan fingerprints, drug target information, monotherapy information, and the cell lines were described with gene expression profiles.

In the stratified 5-fold cross-validation, our method obtained excellent results, where, the receiver operating characteristic area under the curve (ROC AUC) is 0.9217, precision-recall area under the curve (PR AUC) is 0.4651, mean squared error (MSE) is 0.1365, and Pearson correlation coefficient is 0.5335. The performance is significantly better than three other advanced models. Additionally, when using SHapley Additive exPlanations (SHAP) to interpret the biological significance of the prediction results, we found that drug features played more prominent roles than cell line features, and genes associated with cancer development, such as PTK2, CCND1, and GNA11, played an important part in drug synergy prediction. Combining the experimental results, the model proposed in this study has a good prediction effect and can be used as an alternative method for predicting anticancer drug combinations.

## Linked entities

- **Genes:** PTK2 (protein tyrosine kinase 2) [NCBI Gene 5747], CCND1 (cyclin D1) [NCBI Gene 595], GNA11 (G protein subunit alpha 11) [NCBI Gene 2767]
- **Diseases:** cancer (MONDO:0004992)

## Full-text entities

- **Genes:** CCND1 (cyclin D1) [NCBI Gene 595] {aka BCL1, D11S287E, PRAD1, U21B31}, GNA11 (G protein subunit alpha 11) [NCBI Gene 2767] {aka FBH, FBH2, FHH2, GNA-11, HG1K, HHC2}, PTK2 (protein tyrosine kinase 2) [NCBI Gene 5747] {aka FADK, FADK 1, FAK, FAK1, FRNK, PPP1R71}
- **Diseases:** cancer (MESH:D009369)
- **Chemicals:** CatBoost (-)

## Figures

38 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12190655/full.md

---
Source: https://tomesphere.com/paper/PMC12190655