Combating small molecule aggregation with machine learning
Kuan Lee, Ann Yang, Yen-Chu Lin, Daniel Reker, Goncalo J. L. Bernardes, and Tiago Rodrigues

TL;DR
This paper introduces a machine learning tool that accurately predicts small molecule aggregation, outperforming experts, and offers insights into aggregation mechanisms, thereby improving drug discovery processes.
Contribution
The authors developed a novel machine learning method that predicts small molecule aggregation with high accuracy and provides interpretability, surpassing expert predictions.
Findings
Achieved 80% prediction accuracy on challenging out-of-sample data.
Outperformed expert chemists in predicting aggregators.
Estimated 15-20% of ligands in databases may aggregate at screening concentrations.
Abstract
Biological screens are plagued by false positive hits resulting from aggregation. Thus, methods to triage small colloidally aggregating molecules (SCAMs) are in high demand. Herein, we disclose a bespoke machine-learning tool to confidently and intelligibly flag such entities. Our data demonstrate an unprecedented utility of machine learning for predicting SCAMs, achieving 80% of correct predictions in a challenging out-of-sample validation. The tool outperformed a panel of expert chemists, who correctly predicted 61 +/- 7% of the same test molecules in a Turing-like test. Further, the computational routine provided insight into molecular features governing aggregation that had remained hidden to expert intuition. Leveraging our tool, we quantify that up to 15-20% of ligands in publicly available chemogenomic databases have the high potential to aggregate at typical screening…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
