Hydration free energies from kernel-based machine learning: Compound-database bias
Clemens Rauer, Tristan Bereau

TL;DR
This study introduces a kernel-based machine learning method for predicting hydration free energies of small organic molecules, emphasizing the importance of diverse training data to avoid bias and improve transferability.
Contribution
The paper presents a novel atomic-decomposition kernel-based approach that enhances transferability and highlights database bias issues in predicting hydration free energies.
Findings
Representation averaging over conformers improves accuracy.
Atomic decomposition increases transferability.
Biases from narrow chemical databases affect model performance.
Abstract
We consider the prediction of a basic thermodynamic property---hydration free energies---across a large subset of the chemical space of small organic molecules. Our in silico study is based on computer simulations at the atomistic level with implicit solvent. We report on a kernel-based machine learning approach that is inspired by recent work in learning electronic properties, but differs in key aspects: The representation is averaged over several conformers to account for the statistical ensemble. We also include an atomic-decomposition ansatz, which we show offers significant added transferability compared to molecular learning. Finally, we explore the existence of severe biases from databases of experimental compounds. By performing a combination of dimensionality reduction and cross-learning models, we show that the rate of learning depends significantly on the breadth and variety…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
