RxRx3-core: Benchmarking drug-target interactions in High-Content Microscopy
Oren Kraus, Federico Comitani, John Urbanik, Kian Kenyon-Dean, Lakshmanan Arumugam, Saber Saberian, Cas Wognum, Safiye Celik, and Imran S. Haque

TL;DR
This paper introduces RxRx3-core, a compact, accessible microscopy dataset and benchmark for drug-target interaction prediction, facilitating the development of representation learning methods in high-content screening data.
Contribution
The authors present a curated, compressed subset of RxRx3 with benchmarks, enabling easier adoption of representation learning for HCS data and zero-shot DTI prediction.
Findings
RxRx3-core reduces dataset size to 18GB while retaining critical information.
Provides pre-trained embeddings and benchmarking tools for the research community.
Aims to accelerate discovery of drug-target interactions using HCS data.
Abstract
High Content Screening (HCS) microscopy datasets have transformed the ability to profile cellular responses to genetic and chemical perturbations, enabling cell-based inference of drug-target interactions (DTI). However, the adoption of representation learning methods for HCS data has been hindered by the lack of accessible datasets and robust benchmarks. To address this gap, we present RxRx3-core, a curated and compressed subset of the RxRx3 dataset, and an associated DTI benchmarking task. At just 18GB, RxRx3-core significantly reduces the size barrier associated with large-scale HCS datasets while preserving critical data necessary for benchmarking representation learning models against a zero-shot DTI prediction task. RxRx3-core includes 222,601 microscopy images spanning 736 CRISPR knockouts and 1,674 compounds at 8 concentrations. RxRx3-core is available on HuggingFace and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCell Image Analysis Techniques · Image Processing Techniques and Applications · Computational Drug Discovery Methods
