On the role of gradients for machine learning of molecular energies and forces
Anders S. Christensen, O. Anatole von Lilienfeld

TL;DR
This paper investigates how including atomic forces in training data affects the accuracy of machine learning models for molecular energies and forces, revealing domain-dependent benefits and guiding dataset creation.
Contribution
It provides a detailed analysis of the impact of force labels on model accuracy across different molecular datasets and training scenarios.
Findings
Including forces improves energy and force predictions 7-fold for same-molecule geometries.
Force labels do not improve energy predictions for unseen molecules in new conformations.
Force labels and energy labels contribute equally per label to error convergence.
Abstract
The accuracy of any machine learning potential can only be as good as the data used in the fitting process. The most efficient model therefore selects the training data that will yield the highest accuracy compared to the cost of obtaining the training data. We investigate the convergence of prediction errors of quantum machine learning models for organic molecules trained on energy and force labels, two common data types in molecular simulations. When training and predicting on different geometries corresponding to the same single molecule, we find that the inclusion of atomic forces in the training data increases the accuracy of the predicted energies and forces 7-fold, compared to models trained on energy only. Surprisingly, for models trained on sets of organic molecules of varying size and composition in non-equilibrium conformations, inclusion of forces in the training does not…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
