A hybrid unsupervised methodology on artificial intelligence filtering for automatically processing cellular DNA-encoded library (DEL) datasets
Yiran Huang, Xiao Tan, Xiaoyu Li, Feng Xiong, Siu Ming Yiu

TL;DR
A new AI-based method improves processing of DNA-encoded library data for drug discovery by accurately identifying hit compounds.
Contribution
A hybrid unsupervised AI methodology is introduced for efficient and accurate hit identification in noisy cell-based DEL datasets.
Findings
The automated workflow shows high consistency with experimental results across different library sizes.
The method generalizes well to different target proteins like INSR and TPOR.
Pre-trained models and datasets are publicly available for further use and validation.
Abstract
DNA-encoded library (DEL) technology has been developed as a powerful platform for drug development. Live cell-based selection methodologies were recently developed to expedite drug candidate discovery with higher biological relevance. Nevertheless, hit characterization is challenged by prominent background signals of cell-based selections. Therefore, automated data processing streamline compatible with noisy sequencing output is highly desirable. Herein, we report an innovative automatic method that enables the most promising hit identification from large quantities of cell-based DEL datasets with improved accuracy and efficiency. This processing workflow is based on a comprehensive unsupervised algorithm incorporating data pre-processing, feature extracting and outlier filtering, descriptor-based classification, similarity score ranking, and active compound prediction. We performed…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCell Image Analysis Techniques · Machine Learning in Bioinformatics · CRISPR and Genetic Engineering
