Partial Product Aware Machine Learning on DNA-Encoded Libraries
Polina Binder, Meghan Lawler, LaShadric Grady, Neil Carlson, Sumudu, Leelananda, Svetlana Belyanskaya, Joe Franklin, Nicolas Tilmans, Henri, Palacci

TL;DR
This paper introduces a partial product aware machine learning approach for DNA-encoded libraries, leveraging reaction yield data to improve molecule prediction accuracy and generalization in chemical property prediction tasks.
Contribution
It presents a novel method that incorporates partial molecule information into machine learning models, enhancing prediction performance over traditional single-structure assumptions.
Findings
Training on reaction yield data improves model accuracy.
Partial product awareness enhances generalization to unseen molecules.
GNN models outperform standard approaches on DEL data.
Abstract
DNA encoded libraries (DELs) are used for rapid large-scale screening of small molecules against a protein target. These combinatorial libraries are built through several cycles of chemistry and DNA ligation, producing large sets of DNA-tagged molecules. Training machine learning models on DEL data has been shown to be effective at predicting molecules of interest dissimilar from those in the original DEL. Machine learning chemical property prediction approaches rely on the assumption that the property of interest is linked to a single chemical structure. In the context of DNA-encoded libraries, this is equivalent to assuming that every chemical reaction fully yields the desired product. However, in practice, multi-step chemical synthesis sometimes generates partial molecules. Each unique DNA tag in a DEL therefore corresponds to a set of possible molecules. Here, we leverage reaction…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsChemical Synthesis and Analysis · Computational Drug Discovery Methods · Machine Learning in Materials Science
