# Confounder-aware foundation modeling for accurate phenotype profiling in cell imaging

**Authors:** Giorgos Papanastasiou, Pedro P. Sanchez, Argyrios Christodoulidis, Guang Yang, Walter Hugo Lopez Pinaya

PMC · DOI: 10.1038/s44303-025-00116-9 · 2025-10-22

## TL;DR

This paper introduces a new AI model that improves drug discovery by accurately predicting how cells respond to compounds, even when data is limited.

## Contribution

A confounder-aware foundation model is proposed, integrating causal mechanisms to enhance robustness in cell imaging for drug discovery.

## Key findings

- The model achieves state-of-the-art performance in predicting mechanisms of action and compound targets.
- It outperforms existing methods for both seen and unseen compounds with high ROC-AUC scores.
- The model is trained on a large dataset of cell images and compounds, enabling robust phenotype profiling.

## Abstract

Image-based profiling is rapidly transforming drug discovery, offering unprecedented insights into cellular responses. However, experimental variability hinders accurate identification of mechanisms of action (MoA) and compound targets. Existing methods commonly fail to generalize to novel compounds, limiting their utility in exploring uncharted chemical space. To address this, we present a confounder-aware foundation model integrating a causal mechanism within a latent diffusion model, enabling the generation of balanced synthetic datasets for robust biological effect estimation. Trained on over 13 million Cell Painting images and 107 thousand compounds, our model learns robust cellular phenotype representations, mitigating confounder impact. We achieve state-of-the-art MoA and target prediction for both seen (0.66 and 0.65 ROC-AUC) and unseen compounds (0.65 and 0.73 ROC-AUC), significantly surpassing real and batch-corrected data. This innovative framework advances drug discovery by delivering robust biological effect estimations for novel compounds, potentially accelerating hit expansion. Our model establishes a scalable and adaptable foundation for cell imaging, holding the potential to become a cornerstone in data-driven drug discovery.

## Full-text entities

- **Genes:** CHST3 (carbohydrate sulfotransferase 3) [NCBI Gene 9469] {aka C6ST, C6ST1, HSD}, CP (ceruloplasmin) [NCBI Gene 1356] {aka AB073614, CP-2}
- **Diseases:** SCM (MESH:D004195), CP (MESH:D002292), MoA (MESH:D009207)
- **Chemicals:** Ciproxifan (MESH:C115705), Erlotinib (MESH:D000069347), CP (-), T (MESH:D014316), DMSO (MESH:D004121), Bilastine (MESH:C445659), AMG-900 (MESH:C555658)
- **Species:** Fenestella gardiennetii (species) [taxon 2499855], Homo sapiens (human, species) [taxon 9606]
- **Cell lines:** CP — Muntiacus muntjak (Barking deer), Spontaneously immortalized cell line (CVCL_9126), U2OS — Homo sapiens (Human), Osteosarcoma, Cancer cell line (CVCL_0042)

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12546604/full.md

---
Source: https://tomesphere.com/paper/PMC12546604