MoleculeNet: A Benchmark for Molecular Machine Learning
Zhenqin Wu, Bharath Ramsundar, Evan N. Feinberg, Joseph Gomes, Caleb, Geniesse, Aneesh S. Pappu, Karl Leswing, Vijay Pande

TL;DR
MoleculeNet provides a comprehensive benchmark suite for molecular machine learning, enabling standardized evaluation of algorithms across diverse datasets and highlighting the strengths and limitations of current methods.
Contribution
It introduces MoleculeNet, a large-scale benchmark with curated datasets, evaluation metrics, and open-source implementations for fair comparison of molecular learning algorithms.
Findings
Learnable representations generally outperform traditional methods.
Physics-aware featurizations are crucial for quantum and biophysical data.
Challenges remain in data-scarce and imbalanced tasks.
Abstract
Molecular machine learning has been maturing rapidly over the last few years. Improved methods and the presence of larger datasets have enabled machine learning algorithms to make increasingly accurate predictions about molecular properties. However, algorithmic progress has been limited due to the lack of a standard benchmark to compare the efficacy of proposed methods; most new algorithms are benchmarked on different datasets making it challenging to gauge the quality of proposed methods. This work introduces MoleculeNet, a large scale benchmark for molecular machine learning. MoleculeNet curates multiple public datasets, establishes metrics for evaluation, and offers high quality open-source implementations of multiple previously proposed molecular featurization and learning algorithms (released as part of the DeepChem open source library). MoleculeNet benchmarks demonstrate that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Computational Drug Discovery Methods · Mass Spectrometry Techniques and Applications
