Hydration free energies from kernel-based machine learning:   Compound-database bias

Clemens Rauer; Tristan Bereau

arXiv:2007.00407·physics.chem-ph·July 2, 2020

Hydration free energies from kernel-based machine learning: Compound-database bias

Clemens Rauer, Tristan Bereau

PDF

TL;DR

This study introduces a kernel-based machine learning method for predicting hydration free energies of small organic molecules, emphasizing the importance of diverse training data to avoid bias and improve transferability.

Contribution

The paper presents a novel atomic-decomposition kernel-based approach that enhances transferability and highlights database bias issues in predicting hydration free energies.

Findings

01

Representation averaging over conformers improves accuracy.

02

Atomic decomposition increases transferability.

03

Biases from narrow chemical databases affect model performance.

Abstract

We consider the prediction of a basic thermodynamic property---hydration free energies---across a large subset of the chemical space of small organic molecules. Our in silico study is based on computer simulations at the atomistic level with implicit solvent. We report on a kernel-based machine learning approach that is inspired by recent work in learning electronic properties, but differs in key aspects: The representation is averaged over several conformers to account for the statistical ensemble. We also include an atomic-decomposition ansatz, which we show offers significant added transferability compared to molecular learning. Finally, we explore the existence of severe biases from databases of experimental compounds. By performing a combination of dimensionality reduction and cross-learning models, we show that the rate of learning depends significantly on the breadth and variety…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.