Predicting DNA damage response using synthetic cell painting profiles and experimental analysis
Chaeyoung Seo, Hyemin Lim, Zanyue Piao, Yeong Jun Koh, Seung Jin Lee

TL;DR
This paper introduces a machine learning framework using synthetic cell painting data to accurately predict DNA damage response, validated through experimental testing.
Contribution
A novel DDR prediction framework using synthetic data augmentation with Gaussian copula and validated through γH2AX and viability assays.
Findings
SVM trained on real data augmented with Gaussian copula synthetic data achieved F1-score of 0.87 and AUROC of 0.94.
Gaussian copula outperformed CTGAN, VAE, and CopulaGAN in preserving morphological fidelity.
Model identified DDR inducers confirmed by γH2AX accumulation and reduced cell viability in external dataset.
Abstract
Detecting DNA damage response (DDR) using cell painting profiles is challenging due to limited sample sizes and skewed class distributions. We established a robust classification framework to enhance DDR prediction based on synthetic data. Using the idr-0080 dataset, we generated synthetic profiles with the Gaussian copula, CTGAN, VAE, and CopulaGAN algorithms, and assessed their quality through fidelity metrics. Among four classifiers evaluated with real and/or synthetic data under preserved or resolved class imbalance, an SVM trained on real data augmented by Gaussian copula-generated synthetic data achieved the best performance (F1-score = 0.87, AUROC = 0.94). SHAP analysis highlighted key predictive morphological features. The model successfully identified known and previously unreported DDR inducers in the external cpg-0012 dataset, which were experimentally confirmed by γH2AX…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topics3D Printing in Biomedical Research · Cellular Mechanics and Interactions · RNA Interference and Gene Delivery
