# Predicting DNA damage response using synthetic cell painting profiles and experimental analysis

**Authors:** Chaeyoung Seo, Hyemin Lim, Zanyue Piao, Yeong Jun Koh, Seung Jin Lee

PMC · DOI: 10.1016/j.isci.2026.115000 · 2026-02-11

## TL;DR

This paper introduces a machine learning framework using synthetic cell painting data to accurately predict DNA damage response, validated through experimental testing.

## Contribution

A novel DDR prediction framework using synthetic data augmentation with Gaussian copula and validated through γH2AX and viability assays.

## Key findings

- SVM trained on real data augmented with Gaussian copula synthetic data achieved F1-score of 0.87 and AUROC of 0.94.
- Gaussian copula outperformed CTGAN, VAE, and CopulaGAN in preserving morphological fidelity.
- Model identified DDR inducers confirmed by γH2AX accumulation and reduced cell viability in external dataset.

## Abstract

Detecting DNA damage response (DDR) using cell painting profiles is challenging due to limited sample sizes and skewed class distributions. We established a robust classification framework to enhance DDR prediction based on synthetic data. Using the idr-0080 dataset, we generated synthetic profiles with the Gaussian copula, CTGAN, VAE, and CopulaGAN algorithms, and assessed their quality through fidelity metrics. Among four classifiers evaluated with real and/or synthetic data under preserved or resolved class imbalance, an SVM trained on real data augmented by Gaussian copula-generated synthetic data achieved the best performance (F1-score = 0.87, AUROC = 0.94). SHAP analysis highlighted key predictive morphological features. The model successfully identified known and previously unreported DDR inducers in the external cpg-0012 dataset, which were experimentally confirmed by γH2AX marker accumulation and reduced cell viability. Overall, our machine learning framework integrating synthetic cell painting profiles effectively predicted DDR, providing a scalable virtual prescreening approach for drug discovery.

•Synthetic Cell Painting profiles improve DNA damage response (DDR) prediction•Gaussian Copula data best preserves morphological fidelity among synthetic generators•SVM with synthetic augmentation achieves high DDR classification accuracy•Predicted DDR inducers are validated by γH2AX induction and cell viability assays

Synthetic Cell Painting profiles improve DNA damage response (DDR) prediction

Gaussian Copula data best preserves morphological fidelity among synthetic generators

SVM with synthetic augmentation achieves high DDR classification accuracy

Predicted DDR inducers are validated by γH2AX induction and cell viability assays

Systems biology; Data processing in systems biology; Machine learning

## Linked entities

- **Proteins:** H2AXA (Histone superfamily protein)

## Full-text entities

- **Genes:** BRCA1 (BRCA1 DNA repair associated) [NCBI Gene 672] {aka BRCAI, BRCC1, BROVCA1, FANCS, IRIS, PNCA4}, CCND1 (cyclin D1) [NCBI Gene 595] {aka BCL1, D11S287E, PRAD1, U21B31}, AP2B1 (adaptor related protein complex 2 subunit beta 1) [NCBI Gene 163] {aka ADTB2, AP105B, AP2-BETA, CLAPB1}, EGFR (epidermal growth factor receptor) [NCBI Gene 1956] {aka ERBB, ERBB1, ERRP, HER1, NISBD2, NNCIS}, SELP (selectin P) [NCBI Gene 6403] {aka CD62, CD62P, GMP140, GRMP, LECAM3, PADGEM}, KRAS (KRAS proto-oncogene, GTPase) [NCBI Gene 3845] {aka 'C-K-RAS, C-K-RAS, CFC2, K-RAS2A, K-RAS2B, K-RAS4A}, FAAH (fatty acid amide hydrolase) [NCBI Gene 2166] {aka FAAH-1, FAAH1, PSAB}, H2AX (H2A.X variant histone) [NCBI Gene 3014] {aka H2A.X, H2A/X, H2AFX}, TUBA1B (tubulin alpha 1b) [NCBI Gene 10376] {aka K-ALPHA-1}, MAOA (monoamine oxidase A) [NCBI Gene 4128] {aka BRNRS, MAO-A}, CDK2 (cyclin dependent kinase 2) [NCBI Gene 1017] {aka CDKN2, p33(CDK2)}, ITIH2 (inter-alpha-trypsin inhibitor heavy chain 2) [NCBI Gene 3698] {aka H2P, ITI-HC2, SHAP}, MYC (MYC proto-oncogene, bHLH transcription factor) [NCBI Gene 4609] {aka MRTL, MYCC, bHLHe39, c-Myc}, PARP1 (poly(ADP-ribose) polymerase 1) [NCBI Gene 142] {aka ADPRT, ADPRT 1, ADPRT1, ARTD1, PARP, PARP-1}, BRD4 (bromodomain containing 4) [NCBI Gene 23476] {aka CAP, CDLS6, FSHRG4, HUNK1, HUNKI, MCAP}
- **Diseases:** ovarian, breast, prostate, and pancreatic cancers (MESH:D010051), cytotoxic (MESH:D064420), mycoplasma (MESH:D009175), DDR (MESH:C537658), sarcoma (MESH:D012509), homologous recombination deficiencies (MESH:C535296), cancer (MESH:D009369), osteosarcoma (MESH:D012516), melanoma (MESH:D008545)
- **Chemicals:** formaldehyde (MESH:D005557), docetaxel (MESH:D000077143), dimethyl sulfoxide (MESH:D004121), McCoy's 5A medium (MESH:C113109), amoxapine (MESH:D000657), etoposide (MESH:D005047), Alexa Fluor 488 (MESH:C000711379), KF38789 (MESH:C453959), captopril (MESH:D002216), doxorubicin (MESH:D004317), cisplatin (MESH:D002945), VAE (-), mitoxantrone (MESH:D008942), norepinephrine (MESH:D009638), SDS (MESH:D012967), tetrindole (MESH:C082644), LY2183240 (MESH:C522826), vincristine (MESH:D014750), paclitaxel (MESH:D017239), acetazolamide (MESH:D000086), methanol (MESH:D000432)
- **Species:** Homo sapiens (human, species) [taxon 9606]
- **Cell lines:** Cpg-0012 — Homo sapiens (Human), Transformed cell line (CVCL_K296), ES2 — Homo sapiens (Human), Embryonic stem cell (CVCL_C769), HCC44 — Homo sapiens (Human), Lung adenocarcinoma, Cancer cell line (CVCL_2060), U2OS — Homo sapiens (Human), Osteosarcoma, Cancer cell line (CVCL_0042), A549 — Homo sapiens (Human), Lung adenocarcinoma, Cancer cell line (CVCL_0023), HTB-96 — Mus musculus (Mouse), Hybridoma (CVCL_A8FQ)

## Figures

8 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12962177/full.md

---
Source: https://tomesphere.com/paper/PMC12962177