Can virtual staining for high-throughput screening generalize?
Samuel Tonks, Cuong Nguyen, Steve Hood, Ryan Musso, Ceridwen Hopely,, Steve Titus, Minh Doan, Iain Styles, Alexander Krull

TL;DR
This study investigates whether virtual staining models trained on diverse high-throughput screening data can effectively generalize across different cell types and phenotypes, revealing key factors influencing their robustness and limitations.
Contribution
It provides the first large-scale analysis of virtual staining model generalization across multiple cell types and phenotypes in HTS datasets, highlighting strategies for training data selection.
Findings
Training on non-toxic samples improves generalization to toxic conditions.
Models trained on ovarian or lung cells generalize better across cell types.
Breast cell trained models show poor cross-type generalization.
Abstract
The large volume and variety of imaging data from high-throughput screening (HTS) in the pharmaceutical industry present an excellent resource for training virtual staining models. However, the potential of models trained under one set of experimental conditions to generalize to other conditions remains underexplored. This study systematically investigates whether data from three cell types (lung, ovarian, and breast) and two phenotypes (toxic and non-toxic conditions) commonly found in HTS can effectively train virtual staining models to generalize across three typical HTS distribution shifts: unseen phenotypes, unseen cell types, and the combination of both. Utilizing a dataset of 772,416 paired bright-field, cytoplasm, nuclei, and DNA-damage stain images, we evaluate the generalization capabilities of models across pixel-based, instance-wise, and biological-feature-based levels. Our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRadiomics and Machine Learning in Medical Imaging · Biomedical Text Mining and Ontologies · AI in cancer detection
MethodsSparse Evolutionary Training
