Massively Parallel Fitting of Gaussian Approximation Potentials
Sascha Klawohn, James R. Kermode, Albert P. Bart\'ok

TL;DR
This paper introduces a scalable, parallel software for fitting Gaussian Approximation Potentials that overcomes memory limitations and accelerates training on large datasets using high-performance computing techniques.
Contribution
A new parallel implementation of GAP fitting that scales to thousands of cores, enabling larger datasets and more complex systems.
Findings
Scales to thousands of cores with no communication overhead.
Lifts memory limitations for training set size.
Provides substantial speedups in model fitting.
Abstract
We present a data-parallel software package for fitting Gaussian Approximation Potentials (GAPs) on multiple nodes using the ScaLAPACK library with MPI and OpenMP. Until now the maximum training set size for GAP models has been limited by the available memory on a single compute node. In our new implementation, descriptor evaluation is carried out in parallel with no communication requirement. The subsequent linear solve required to determine the model coefficients is parallelised with ScaLAPACK. Our approach scales to thousands of cores, lifting the memory limitation and also delivering substantial speedups. This development expands the applicability of the GAP approach to more complex systems as well as opening up opportunities for efficiently embedding GAP model fitting within higher-level workflows such as committee models or hyperparameter optimisation.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGaussian Processes and Bayesian Inference · Machine Learning and Data Classification · Scientific Research and Discoveries
