# GenReP: An Ensemble Model for Predicting TP53 in Response to Pharmaceutical Compounds

**Authors:** Austin Spadaro, Alok Sharma, Iman Dehzangi

PMC · DOI: 10.3390/molecules31040739 · Molecules · 2026-02-21

## TL;DR

This paper introduces GenReP, a machine learning model that predicts how drugs affect TP53 gene expression, a key player in cancer.

## Contribution

The paper introduces GenReP, an ensemble model for predicting TP53 gene expression changes in response to pharmaceutical compounds.

## Key findings

- GenReP achieves 62.9% accuracy, 93.9% sensitivity, and 40.3% specificity in predicting TP53 gene expression changes.
- The model uses molecular fingerprints, descriptors, and scaffold-based features from compound SMILES representations.
- A new benchmark dataset was generated using the Connectivity Map (CMap) database with SMOTE to address class imbalance.

## Abstract

TP53 is a tumor-suppressor gene involved in regulating apoptosis, DNA repair, and genomic stability. Mutations in TP53 are implicated in approximately half of all detected cancers, including breast, lung, colorectal, and ovarian cancers, making it a significant target for therapeutic interventions. Many pharmaceutical drugs aim to restore TP53 function, and there is a need for predictive tools to assess how compounds may affect TP53 expression. In this study, we propose a new ensemble machine-learning model to predict the direction of TP53 relative gene expression in response to pharmaceutical compounds. Our model utilizes molecular fingerprints, descriptors, and scaffold-based features extracted from SMILES representations of compounds concatenated into a single feature vector. Trained using our newly generated benchmark dataset based on the Connectivity Map (CMap) database and addressing class imbalance with the Synthetic Minority Over-sampling Technique (SMOTE), our model achieves 62.9%, 93.9%, 40.3%, and 0.39 in terms of accuracy, sensitivity, specificity, and Matthews Correlation Coefficient (MCC), respectively. As the first-of-its-kind TP53 gene regulation prediction, our study serves as a convincing proof-of-concept that paves the way for future investigation. GenReP as a stand-alone predictor, its source code, and our newly generated benchmark dataset are publicly available.

## Linked entities

- **Genes:** TP53 (tumor protein p53) [NCBI Gene 7157]
- **Diseases:** breast cancer (MONDO:0004989), lung cancer (MONDO:0005138), colorectal cancer (MONDO:0005575), ovarian cancer (MONDO:0005140)

## Full-text entities

- **Genes:** CST7 (cystatin F) [NCBI Gene 8530] {aka CMAP}, TP53 (tumor protein p53) [NCBI Gene 7157] {aka BCC7, BMFS5, LFS1, P53, TRP53}
- **Diseases:** breast, lung, colorectal, and ovarian cancers (MESH:D010051), cancer (MESH:D009369), injury to (MESH:D014947)
- **Chemicals:** n-octanol (MESH:D020003), Bemis-Murcko (-), Hydrogen (MESH:D006859), polystyrene (MESH:D011137), nitrogen (MESH:D009584), oxygen (MESH:D010100), water (MESH:D014867)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12943182/full.md

## Figures

7 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12943182/full.md

## References

80 references — full list in the complete paper: https://tomesphere.com/paper/PMC12943182/full.md

---
Source: https://tomesphere.com/paper/PMC12943182