Building Robust Machine Learning Models for Small Chemical Science Data: The Case of Shear Viscosity
Nikhil V. S. Avula, Shivanand K. Veesam, Sudarshan Behera and, Sundaram Balasubramanian

TL;DR
This paper develops and evaluates machine learning strategies to accurately predict shear viscosity from small datasets, addressing overfitting, model selection, and uncertainty quantification to improve reliability in chemical science applications.
Contribution
The study demonstrates effective methods for model selection, performance estimation, and uncertainty quantification tailored for small chemical datasets, enhancing ML model robustness.
Findings
k-fold cross-validation reduces error estimate variance
Gaussian Process Regression provides reliable uncertainty estimates
Applicability domain improves prediction reliability on new small datasets
Abstract
Shear viscosity, though being a fundamental property of all liquids, is computationally expensive to estimate from equilibrium molecular dynamics simulations. Recently, Machine Learning (ML) methods have been used to augment molecular simulations in many contexts, thus showing promise to estimate viscosity too in a relatively inexpensive manner. However, ML methods face significant challenges like overfitting when the size of the data set is small, as is the case with viscosity. In this work, we train several ML models to predict the shear viscosity of a Lennard-Jones (LJ) fluid, with particular emphasis on addressing issues arising from a small data set. Specifically, the issues related to model selection, performance estimation and uncertainty quantification were investigated. First, we show that the widely used performance estimation procedure of using a single unseen data set shows…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Computational Drug Discovery Methods · Phase Equilibria and Thermodynamics
MethodsGaussian Process
