AssayMatch: Learning to Select Data for Molecular Activity Models
Vincent Fan, Regina Barzilay

TL;DR
AssayMatch is a framework that improves drug discovery models by selecting high-quality, compatible bioactivity data through assay attribution and embedding finetuning, leading to better predictive performance.
Contribution
It introduces a novel data selection method that uses assay attribution and embedding finetuning to enhance model training with more homogeneous data.
Findings
Models trained on AssayMatch-selected data outperform those trained on full datasets.
AssayMatch improves prediction accuracy for most tested model-target pairs.
The method effectively filters out noisy or incompatible bioactivity experiments.
Abstract
The performance of machine learning models in drug discovery is highly dependent on the quality and consistency of the underlying training data. Due to limitations in dataset sizes, many models are trained by aggregating bioactivity data from diverse sources, including public databases such as ChEMBL. However, this approach often introduces significant noise due to variability in experimental protocols. We introduce AssayMatch, a framework for data selection that builds smaller, more homogenous training sets attuned to the test set of interest. AssayMatch leverages data attribution methods to quantify the contribution of each training assay to model performance. These attribution scores are used to finetune language embeddings of text-based assay descriptions to capture not just semantic similarity, but also the compatibility between assays. Unlike existing data attribution methods, our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational Drug Discovery Methods · Cell Image Analysis Techniques · Machine Learning in Materials Science
