AMPL: A Data-Driven Modeling Pipeline for Drug Discovery

Amanda J. Minnich; Kevin McLoughlin; Margaret Tse; Jason Deng; Andrew; Weber; Neha Murad; Benjamin D. Madej; Bharath Ramsundar; Tom Rush; Stacie; Calad-Thomson; Jim Brase; Jonathan E. Allen

arXiv:1911.05211·q-bio.QM·November 15, 2019

AMPL: A Data-Driven Modeling Pipeline for Drug Discovery

Amanda J. Minnich, Kevin McLoughlin, Margaret Tse, Jason Deng, Andrew, Weber, Neha Murad, Benjamin D. Madej, Bharath Ramsundar, Tom Rush, Stacie, Calad-Thomson, Jim Brase, Jonathan E. Allen

PDF

2 Repos

TL;DR

AMPL is an open-source, modular pipeline that enhances reproducibility in drug discovery machine learning models, demonstrating that dataset size and feature type significantly influence model performance and uncertainty quantification effectiveness.

Contribution

The paper introduces AMPL, a comprehensive, extensible pipeline for building and sharing ML models in drug discovery, integrating various tools and benchmarking its performance across diverse datasets.

Findings

01

Physicochemical descriptors and deep learning graph representations outperform traditional fingerprints.

02

Larger datasets improve prediction accuracy and model performance.

03

Uncertainty quantification varies in effectiveness across datasets and models.

Abstract

One of the key requirements for incorporating machine learning into the drug discovery process is complete reproducibility and traceability of the model building and evaluation process. With this in mind, we have developed an end-to-end modular and extensible software pipeline for building and sharing machine learning models that predict key pharma-relevant parameters. The ATOM Modeling PipeLine, or AMPL, extends the functionality of the open source library DeepChem and supports an array of machine learning and molecular featurization tools. We have benchmarked AMPL on a large collection of pharmaceutical datasets covering a wide range of parameters. As a result of these comprehensive experiments, we have found that physicochemical descriptors and deep learning-based graph representations significantly outperform traditional fingerprints in the characterization of molecular features. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.