TL;DR
AMPL is an open-source, modular pipeline that enhances reproducibility in drug discovery machine learning models, demonstrating that dataset size and feature type significantly influence model performance and uncertainty quantification effectiveness.
Contribution
The paper introduces AMPL, a comprehensive, extensible pipeline for building and sharing ML models in drug discovery, integrating various tools and benchmarking its performance across diverse datasets.
Findings
Physicochemical descriptors and deep learning graph representations outperform traditional fingerprints.
Larger datasets improve prediction accuracy and model performance.
Uncertainty quantification varies in effectiveness across datasets and models.
Abstract
One of the key requirements for incorporating machine learning into the drug discovery process is complete reproducibility and traceability of the model building and evaluation process. With this in mind, we have developed an end-to-end modular and extensible software pipeline for building and sharing machine learning models that predict key pharma-relevant parameters. The ATOM Modeling PipeLine, or AMPL, extends the functionality of the open source library DeepChem and supports an array of machine learning and molecular featurization tools. We have benchmarked AMPL on a large collection of pharmaceutical datasets covering a wide range of parameters. As a result of these comprehensive experiments, we have found that physicochemical descriptors and deep learning-based graph representations significantly outperform traditional fingerprints in the characterization of molecular features. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
