A Bin and Hash Method for Analyzing Reference Data and Descriptors in   Machine Learning Potentials

Mart\'in Leandro Paleico; J\"org Behler

arXiv:2008.10977·physics.comp-ph·August 26, 2020·Mach. Learn. Sci. Technol.

A Bin and Hash Method for Analyzing Reference Data and Descriptors in Machine Learning Potentials

Mart\'in Leandro Paleico, J\"org Behler

PDF

TL;DR

This paper introduces the bin-and-hash (BAH) algorithm, a novel method to efficiently analyze and compare large multidimensional datasets in machine learning potentials, improving data handling and quality assessment.

Contribution

The BAH algorithm provides a general, efficient approach for identifying and comparing large sets of vectors in ML potentials, aiding in data reduction and quality control.

Findings

01

Enables efficient comparison of large multidimensional vectors

02

Reduces redundancy in reference datasets

03

Improves assessment of descriptor quality

Abstract

In recent years the development of machine learning (ML) potentials (MLP) has become a very active field of research. Numerous approaches have been proposed, which allow to perform extended simulations of large systems at a small fraction of the computational costs of electronic structure calculations. The key to the success of modern ML potentials is the close-to first principles quality description of the atomic interactions. This accuracy is reached by using very flexible functional forms in combination with high-level reference data from electronic structure calculations. These data sets can include up to hundreds of thousands of structures covering millions of atomic environments to ensure that all relevant features of the potential energy surface are well represented. The handling of such large data sets is nowadays becoming one of the main challenges in the construction of ML…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.