Machine learning on DNA-encoded library count data using an uncertainty-aware probabilistic loss function
Katherine S. Lim, Andrew G. Reidenbach, Bruce K. Hua, Jeremy W. Mason,, Christopher J. Gerry, Paul A. Clemons, Connor W. Coley

TL;DR
This paper introduces a regression-based method with an uncertainty-aware loss function for analyzing DNA-encoded library data, improving denoising and visualization of structure-activity relationships in drug discovery.
Contribution
It presents a novel Poisson-based negative log-likelihood loss function for DEL data, enabling uncertainty modeling and better SAR trend visualization compared to binary classifiers.
Findings
Effective denoising of DEL data using the proposed method.
Improved visualization of structure-activity relationships.
Model robustness by ignoring low-confidence outliers.
Abstract
DNA-encoded library (DEL) screening and quantitative structure-activity relationship (QSAR) modeling are two techniques used in drug discovery to find small molecules that bind a protein target. Applying QSAR modeling to DEL data can facilitate the selection of compounds for off-DNA synthesis and evaluation. Such a combined approach has been shown recently by training binary classifiers to learn DEL enrichments of aggregated "disynthons" to accommodate the sparse and noisy nature of DEL data. However, a binary classifier cannot distinguish between different levels of enrichment, and information is potentially lost during disynthon aggregation. Here, we demonstrate a regression approach to learning DEL enrichments of individual molecules using a custom negative log-likelihood loss function that effectively denoises DEL data and introduces opportunities for visualization of learned…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational Drug Discovery Methods · Machine Learning in Materials Science · Chemical Synthesis and Analysis
