Supervised Pretraining for Molecular Force Fields and Properties Prediction
Xiang Gao, Weihao Gao, Wenzhi Xiao, Zhirui Wang, Chong Wang, Liang, Xiang

TL;DR
This paper introduces a supervised pretraining approach on a large molecular dataset, significantly enhancing the accuracy of molecular property and force field predictions, and revealing rich structural information in learned representations.
Contribution
It presents a novel supervised pretraining method on 86 million molecules, improving downstream task performance and demonstrating the encoding of detailed molecular information.
Findings
Pretrained models outperform training from scratch on multiple tasks.
Linear probing reveals the model captures atom types, distances, and molecular fragments.
Supervised pretraining is a promising direction for molecular modeling.
Abstract
Machine learning approaches have become popular for molecular modeling tasks, including molecular force fields and properties prediction. Traditional supervised learning methods suffer from scarcity of labeled data for particular tasks, motivating the use of large-scale dataset for other relevant tasks. We propose to pretrain neural networks on a dataset of 86 millions of molecules with atom charges and 3D geometries as inputs and molecular energies as labels. Experiments show that, compared to training from scratch, fine-tuning the pretrained model can significantly improve the performance for seven molecular property prediction tasks and two force field tasks. We also demonstrate that the learned representations from the pretrained model contain adequate information about molecular structures, by showing that linear probing of the representations can predict many molecular information…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational Drug Discovery Methods · Machine Learning in Materials Science · Various Chemistry Research Topics
