Machine Learning of Free Energies in Chemical Compound Space Using   Ensemble Representations: Reaching Experimental Uncertainty for Solvation

Jan Weinreich; Nicholas J. Browning; O. Anatole von Lilienfeld

arXiv:2012.09722·physics.chem-ph·April 14, 2021

Machine Learning of Free Energies in Chemical Compound Space Using Ensemble Representations: Reaching Experimental Uncertainty for Solvation

Jan Weinreich, Nicholas J. Browning, O. Anatole von Lilienfeld

PDF

TL;DR

This paper introduces a machine learning model for predicting solvation free energies across chemical space, achieving experimental accuracy with minimal computational effort by using ensemble representations and molecular dynamics sampling.

Contribution

The authors develop a novel Free energy Machine Learning (FML) model that employs Boltzmann-averaged ensemble representations and short MD simulations, reaching experimental uncertainty levels in solvation free energy predictions.

Findings

01

FML prediction errors decrease with training set size, reaching 0.6 kcal/mol after 490 molecules.

02

FML's accuracy is comparable to state-of-the-art physics-based methods.

03

The model effectively analyzes solvation across 116k molecules, identifying key structural features.

Abstract

Free energies govern the behavior of soft and liquid matter, and improving their predictions could have a large impact on the development of drugs, electrolytes or homogeneous catalysts. Unfortunately, it is challenging to devise an accurate description of effects governing solvation such as hydrogen-bonding, van der Waals interactions, or conformational sampling. We present a Free energy Machine Learning (FML) model applicable throughout chemical compound space and based on a representation that employs Boltzmann averages to account for an approximated sampling of configurational space. Using the FreeSolv database, FML's out-of-sample prediction errors of experimental hydration free energies decay systematically with training set size, and experimental uncertainty (0.6 kcal/mol) is reached after training on 490 molecules (80\% of FreeSolv). Corresponding FML model errors are also on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.