Pre-training via Denoising for Molecular Property Prediction
Sheheryar Zaidi, Michael Schaarschmidt, James Martens, Hyunjik Kim,, Yee Whye Teh, Alvaro Sanchez-Gonzalez, Peter Battaglia, Razvan Pascanu,, Jonathan Godwin

TL;DR
This paper introduces a denoising pre-training method for molecular property prediction that leverages large 3D structure datasets to learn representations, significantly improving performance on benchmarks like QM9.
Contribution
It presents a novel denoising pre-training approach that models molecular force fields, achieving state-of-the-art results in molecular property prediction tasks.
Findings
Achieves new state-of-the-art on QM9 dataset
Pre-training improves performance across multiple benchmarks
Insights into factors affecting pre-training effectiveness
Abstract
Many important problems involving molecular property prediction from 3D structures have limited data, posing a generalization challenge for neural networks. In this paper, we describe a pre-training technique based on denoising that achieves a new state-of-the-art in molecular property prediction by utilizing large datasets of 3D molecular structures at equilibrium to learn meaningful representations for downstream tasks. Relying on the well-known link between denoising autoencoders and score-matching, we show that the denoising objective corresponds to learning a molecular force field -- arising from approximating the Boltzmann distribution with a mixture of Gaussians -- directly from equilibrium structures. Our experiments demonstrate that using this pre-training objective significantly improves performance on multiple benchmarks, achieving a new state-of-the-art on the majority of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsMachine Learning in Materials Science · Computational Drug Discovery Methods · Protein Structure and Dynamics
