CA-DEL: An Open Multi-Target, Multi-Modal Benchmark for Learning from DNA-Encoded Library Screens
Mutian He, Hanqun Cao, Cheng Tan, Zijun Gao, Xiaojun Yao, Chunbin Gu, Pheng-Ann Heng

TL;DR
CA-DEL is a new public benchmark for machine learning in drug discovery, focusing on predicting selectivity among homologous enzymes using noisy sequencing data and validated binding affinities.
Contribution
It introduces a multi-modal, multi-target benchmark with real binding affinity data, addressing the challenge of noisy DEL screening signals in drug discovery models.
Findings
Benchmark includes data for three homologous carbonic anhydrase isoforms.
Integrates sequencing read counts with high-fidelity binding affinity data.
Facilitates development of models that generalize from noisy to accurate biophysical measurements.
Abstract
The success of machine learning in drug discovery hinges on learning the relationship between a chemical structure and its biological activity. While DNA-Encoded Library (DEL) technology can generate the massive datasets required for this task, its primary signal -- sequencing read counts -- is an indirect and often noisy proxy for true molecular binding affinity. To address the scarcity of public benchmarks for developing robust models that can overcome this data challenge, we introduce CA-DEL, a multi-dimensional public benchmark featuring screens against three homologous carbonic anhydrase isoforms. While recent benchmarks like KinDEL have introduced 3D poses for kinase targets, CA-DEL distinguishes itself by focusing on the selectivity challenge among homologous Carbonic Anhydrase isoforms (CAII, CAIX, CAXII). Unlike benchmarks relying solely on noisy enrichment scores, CA-DEL…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
