Optimizing many-body atomic descriptors for enhanced computational   performance of machine learning based interatomic potentials

Miguel A. Caro

arXiv:1905.02142·physics.comp-ph·September 16, 2019

Optimizing many-body atomic descriptors for enhanced computational performance of machine learning based interatomic potentials

Miguel A. Caro

PDF

TL;DR

This paper introduces a simplified, more efficient SOAP atomic descriptor that significantly speeds up the evaluation of machine learning interatomic potentials, maintaining accuracy and stability.

Contribution

The authors develop a separable, analytical SOAP descriptor with recursion formulas, achieving tenfold speedups and improved stability for interatomic potential calculations.

Findings

01

Tenfold increase in computational speed.

02

Enhanced stability of radial expansion for distant neighbors.

03

Maintained interpolation accuracy of GAP models.

Abstract

We explore different ways to simplify the evaluation of the smooth overlap of atomic positions (SOAP) many-body atomic descriptor [Bart\'{o}k et al., Phys. Rev. B 87, 184115 (2013)]. Our aim is to improve the computational efficiency of SOAP-based similarity kernel construction. While these improved atomic descriptors can be used for general characterization and interpolation of atomic properties, their main target application is accelerated evaluation of machine-learning-based interatomic potentials within the Gaussian approximation potential (GAP) framework [Bart\'{o}k et al., Phys. Rev. Lett. 104, 136403 (2010)]. We achieve this objective by expressing the atomic densities in an approximate separable form, which decouples the radial and angular channels. We then express the elements of the SOAP descriptor (i.e., the expansion coefficients for the atomic densities) in analytical form…

Equations110

ρ^{(i)} (r) = j \in S_{i} (r_{cut}) \sum ρ_{j}^{(i)} (r),

ρ^{(i)} (r) = j \in S_{i} (r_{cut}) \sum ρ_{j}^{(i)} (r),

k^{SOAP} (i, j) \propto \int d \hat{R} \int d r ρ_{i}^{*} (r) ρ_{j} (\hat{R} r)^{2},

k^{SOAP} (i, j) \propto \int d \hat{R} \int d r ρ_{i}^{*} (r) ρ_{j} (\hat{R} r)^{2},

ρ (r) = j \in S (r_{cut}) \sum n l m \sum c_{n l m}^{j} g_{n} (r) Y_{l m} (θ, ϕ),

ρ (r) = j \in S (r_{cut}) \sum n l m \sum c_{n l m}^{j} g_{n} (r) Y_{l m} (θ, ϕ),

k^{SOAP} (i, j) \propto

k^{SOAP} (i, j) \propto

=

p_{n n^{'} l} (i) = m \sum c_{n l m}^{i} (c_{n^{'} l m}^{i})^{*}

p_{n n^{'} l} (i) = m \sum c_{n l m}^{i} (c_{n^{'} l m}^{i})^{*}

q^{SOAP} (i) = \frac{p ( i )}{p ( i ) \cdot p ( i )} .

q^{SOAP} (i) = \frac{p ( i )}{p ( i ) \cdot p ( i )} .

k^{SOAP} (i, j) = (q^{SOAP} (i) \cdot q^{SOAP} (j))^{ζ},

k^{SOAP} (i, j) = (q^{SOAP} (i) \cdot q^{SOAP} (j))^{ζ},

ρ_{j} (r) = exp (- \frac{∣ r - r _{j} ∣ ^{2}}{2 σ ^{2}}),

ρ_{j} (r) = exp (- \frac{∣ r - r _{j} ∣ ^{2}}{2 σ ^{2}}),

c_{n l m}^{j} = b_{n l}^{j} c_{l m}^{j} .

c_{n l m}^{j} = b_{n l}^{j} c_{l m}^{j} .

c_{l m}^{j} (r) \equiv 4 π exp [- α (r^{2} + r_{j}^{2})] i_{l} (2 α r r_{j}) Y_{l m}^{*} (θ_{j}, ϕ_{j}),

c_{l m}^{j} (r) \equiv 4 π exp [- α (r^{2} + r_{j}^{2})] i_{l} (2 α r r_{j}) Y_{l m}^{*} (θ_{j}, ϕ_{j}),

ρ_{j} (r) \approx ρ_{r, j} (r) ρ_{⊥, j} (θ, ϕ) .

ρ_{j} (r) \approx ρ_{r, j} (r) ρ_{⊥, j} (θ, ϕ) .

ρ_{j} (r) = exp [- \frac{1}{2} \frac{( r - r _{j} ) ^{2}}{σ _{r}^{2}}] exp [- \frac{1}{2} \frac{r _{⊥, j} ^{2}}{σ _{⊥}^{2}}],

ρ_{j} (r) = exp [- \frac{1}{2} \frac{( r - r _{j} ) ^{2}}{σ _{r}^{2}}] exp [- \frac{1}{2} \frac{r _{⊥, j} ^{2}}{σ _{⊥}^{2}}],

r_{⊥, j}^{2} \approx

r_{⊥, j}^{2} \approx

=

ρ_{j} (r) = (1) exp [- \frac{( r - r _{j} ) ^{2}}{2 σ _{r, j}^{2}}] (2) exp [- \frac{r _{j}^{2}}{σ _{⊥, j}^{2}}] (3) exp [\frac{r _{j} \cdot r _{j} r ^}{σ _{⊥, j}^{2}}],

ρ_{j} (r) = (1) exp [- \frac{( r - r _{j} ) ^{2}}{2 σ _{r, j}^{2}}] (2) exp [- \frac{r _{j}^{2}}{σ _{⊥, j}^{2}}] (3) exp [\frac{r _{j} \cdot r _{j} r ^}{σ _{⊥, j}^{2}}],

ρ (r) = j \sum A_{j} f (r; r_{j}, r_{cut}) ρ_{j} (r),

ρ (r) = j \sum A_{j} f (r; r_{j}, r_{cut}) ρ_{j} (r),

σ_{r, j}

σ_{r, j}

A_{j}

exp [\frac{r _{j} \cdot r _{j} r ^}{σ _{⊥, j}^{2}}] = l = 0 \sum \infty (2 l + 1) i_{l} (\frac{r _{j}^{2}}{σ _{⊥, j}^{2}}) P_{l} (cos ϵ_{j}),

exp [\frac{r _{j} \cdot r _{j} r ^}{σ _{⊥, j}^{2}}] = l = 0 \sum \infty (2 l + 1) i_{l} (\frac{r _{j}^{2}}{σ _{⊥, j}^{2}}) P_{l} (cos ϵ_{j}),

4 π l = 0 \sum \infty i_{l} (\frac{r _{j}^{2}}{σ _{⊥, j}^{2}}) m = - l \sum l Y_{l m}^{*} (θ_{j}, ϕ_{j}) Y_{l m} (θ, ϕ) .

4 π l = 0 \sum \infty i_{l} (\frac{r _{j}^{2}}{σ _{⊥, j}^{2}}) m = - l \sum l Y_{l m}^{*} (θ_{j}, ϕ_{j}) Y_{l m} (θ, ϕ) .

b_{n}^{j} = \int_{0}^{r_{cut}} d r g_{n} (r) A_{j} f (r; r_{j}, r_{cut}) exp [- \frac{1}{2} \frac{( r - r _{j} ) ^{2}}{σ _{r, j}^{2}}],

b_{n}^{j} = \int_{0}^{r_{cut}} d r g_{n} (r) A_{j} f (r; r_{j}, r_{cut}) exp [- \frac{1}{2} \frac{( r - r _{j} ) ^{2}}{σ _{r, j}^{2}}],

c_{n l m}^{j} = 4 π b_{n}^{j} exp (- \frac{r _{j}^{2}}{σ _{⊥, j}^{2}}) (2 l + 1) i_{l} (\frac{r _{j}^{2}}{σ _{⊥, j}^{2}}) Y_{l m}^{*} (θ_{j}, ϕ_{j}) .

c_{n l m}^{j} = 4 π b_{n}^{j} exp (- \frac{r _{j}^{2}}{σ _{⊥, j}^{2}}) (2 l + 1) i_{l} (\frac{r _{j}^{2}}{σ _{⊥, j}^{2}}) Y_{l m}^{*} (θ_{j}, ϕ_{j}) .

c_{n l m} = j \sum c_{n l m}^{j},

c_{n l m} = j \sum c_{n l m}^{j},

p_{n n^{'} l} = m = - l \sum l c_{n l m} c_{n^{'} l m}^{*} .

p_{n n^{'} l} = m = - l \sum l c_{n l m} c_{n^{'} l m}^{*} .

m = - l \sum l c_{n l m} c_{n^{'} l m}^{*} =

m = - l \sum l c_{n l m} c_{n^{'} l m}^{*} =

c_{n l 0} c_{n^{'} l 0}^{*} + 2 m = 1 \sum l Re {c_{n l m} c_{n^{'} l m}^{*}},

g_{n} (r) = α = 1 \sum n_{max} W_{n α} ϕ_{α} (r),

g_{n} (r) = α = 1 \sum n_{max} W_{n α} ϕ_{α} (r),

ϕ_{α} (r) = (1 - \frac{r}{r _{cut}})^{α + 2} / N_{α},

ϕ_{α} (r) = (1 - \frac{r}{r _{cut}})^{α + 2} / N_{α},

W = S^{- 1/2},

W = S^{- 1/2},

S_{α β} = \int_{0}^{r_{cut}} d r ϕ_{α} (r) ϕ_{β} (r) .

ϕ_{n_{max}} (r) = \frac{2}{σ _{0, r} π ^{1/4}} exp [- \frac{r ^{2}}{2 σ _{0, r}^{2}}] .

ϕ_{n_{max}} (r) = \frac{2}{σ _{0, r} π ^{1/4}} exp [- \frac{r ^{2}}{2 σ _{0, r}^{2}}] .

b_{α}^{j} = \int_{0}^{r_{cut}} d r ϕ_{α} (r) exp [- \frac{1}{2} \frac{( r - r _{j} ) ^{2}}{σ _{r, j}^{2}}] .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Optimizing many-body atomic descriptors for enhanced computational performance

of machine learning based interatomic potentials

Miguel A. Caro

[email protected]

Department of Electrical Engineering and Automation, Aalto University, 02150, Espoo, Finland

Department of Applied Physics, Aalto University, 02150, Espoo, Finland

(July 17, 2019)

Abstract

We explore different ways to simplify the evaluation of the smooth overlap of atomic positions (SOAP) many-body atomic descriptor [Bartók et al., Phys. Rev. B 87, 184115 (2013)]. Our aim is to improve the computational efficiency of SOAP-based similarity kernel construction. While these improved atomic descriptors can be used for general characterization and interpolation of atomic properties, their main target application is accelerated evaluation of machine-learning-based interatomic potentials within the Gaussian approximation potential (GAP) framework [Bartók et al., Phys. Rev. Lett. 104, 136403 (2010)]. We achieve this objective by expressing the atomic densities in an approximate separable form, which decouples the radial and angular channels. We then express the elements of the SOAP descriptor (i.e., the expansion coefficients for the atomic densities) in analytical form given a particular choice of radial basis set. Finally, we derive recursion formulas for the expansion coefficients. This new SOAP-based descriptor allows for tenfold speedups compared to previous implementations, while improving the stability of the radial expansion for distant atomic neighbors, without degradation of the interpolation power of GAP models.

I Smooth overlap of atomic positions

Machine learning (ML) applied to materials modeling has rapidly gained widespread attention within the computational physics, chemistry and materials science communities due to its ability to speed up the simulation times for accurate prediction of the properties of materials. In particular, significant speedups are obtained with respect to simulation times currently required for atomistic simulation within first-principles approaches, such as density-functional theory (DFT). These new ML methodologies also grant access to new time and length scales in the simulation of interatomic interactions, allowing us to solve outstanding scientific problems whose study has been previously out of reach Caro et al. (2018a). Several ML approaches have arisen in recent years for interpolation of interatomic potential energy surfaces (PES), most notably based on artificial neural networks and kernel-based regression techniques Behler and Parrinello (2007); Bartók et al. (2010). All these approaches feed on two types of data: 1) the observables to be learned and interpolated (e.g., atomic energies and forces) which are used during the ML training stage and 2) the structural information that characterizes atomic environments, known in the ML jargon as “descriptors”, which are to be used both during the training stage and when interpolating the PES. Traditional descriptors used for characterization of PESs with “classical” (or “empirical”) force fields are bond distances, bond angles, improper/dihedral angles, etc., all of which involve interactions between two, three or, at most, a handful of particles. However, to make the most out of the newly available ML infrastructure and learn complex PESs there is a need for accurate, yet computationally inexpensive, many body descriptors Behler (2011); Huo and Rupp (2017).

The smooth overlap of atomic positions (SOAP) is a recently-introduced approach to encode atomic environments into a rotationally-invariant representation, given by the SOAP vectors Bartók et al. (2013). These many-body atomic descriptors are designed to provide an accurate measure of similarity between atomic environments, which can then be fed into kernel-based ML algorithms. In particular, when used in combination with the Gaussian approximation potential (GAP) formalism Bartók et al. (2010), SOAP enables accurate and efficient interpolation of potential energy surfaces Deringer and Csányi (2017). This accuracy has enabled molecular dynamics (MD) simulations of large and complex systems that were previously out of reach Caro et al. (2018a). However, compared to analytical force fields, SOAP-based GAPs are still CPU-expensive, with the evaluation of SOAP descriptors being the computational bottleneck. Being able to speed up SOAP evaluation would therefore provide an invaluable tool for making larger system sizes and simulation times accessible to ML-based MD simulation codes.

The (full) representation of the atomic density underlying the SOAP approach is done using 3D Gaussians centered at the atomic (nuclear) sites. The atomic density within a cutoff sphere $S_{i}(r_{\text{cut}})$ surrounding and centered on atom $i$ is therefore given by:

[TABLE]

where the sum extends over all atoms $j$ inside the cutoff sphere, possibly including $i$ itself. A rotationally-invariant comparison of two such densities is achieved by computing their overlap integral and averaging over all possible rotations $\hat{R}$ of one of the atomic environments Bartók et al. (2013):

[TABLE]

where the SOAP kernel $k^{\text{SOAP}}(i,j)$ gives a bounded measure of similarity between the atomic environment of atom $i$ and the atomic environment of atom $j$ . This similarity measure varies between 0 (the environments are nothing alike) and 1 (the environments are identical). Note that the exponent to which the overlap integral is raised, 2 in this case, must be greater than one for the SOAP kernel to retain angular information Bartók et al. (2013). Explicitly computing this integral is impractical from a computational efficiency standpoint, and thus the usefulness of SOAP is built on the reformulation of this problem. In this context, a discrete representation of the densities is achieved by expanding them in a basis. Using a combination of radial basis and spherical harmonics allows us to construct a rotationally-invariant descriptor in vector form, whose components are products of the expansion coefficients, without the need to explicitly perform the rotation. The expanded density takes the following form:

[TABLE]

where we have omitted the $(i)$ index for compactness. The $\{g_{n}\}$ is an orthonormal radial basis and $Y_{lm}(\theta,\phi)$ are the spherical harmonics.

From these expansion coefficients and after some algebraic manipulation, it can be shown that the SOAP kernel can be expressed as Bartók et al. (2013):

[TABLE]

where

[TABLE]

is the power spectrum of the atomic density. The vectors given by $\textbf{p}\equiv\{p_{nn^{\prime}l}\}$ define, after normalization, the SOAP many-body atomic descriptors:

[TABLE]

The expression for the SOAP kernel follows in the form of a dot product:

[TABLE]

where $\zeta$ is some positive number, usually greater than 1, that controls the “sharpness” of the kernel, that is, the ability of the kernel to emphasize differences between atomic environments (the larger $\zeta$ the sharper the kernel) Bartók and Csányi (2015). These SOAP descriptors simplify immensely the task of evaluating Eq. (2). However, there is a number of further simplifications and modifications of the original SOAP formulation that can dramatically increase the computational performance of this approach. In the next section we propose a new form of a SOAP-like many-body descriptor and prove its superior suitability for computational evaluation.

II New SOAP based on pseudogaussian functions

In the original SOAP formulation Bartók et al. (2013), the atomic densities are represented by atom-centered Gaussian functions (Fig. 1):

[TABLE]

and the corresponding expansion coefficients $c^{j}_{nlm}$ have the form:

[TABLE]

The angular dependence of the $b$ s arises because, if we were to retain the radial dependence of the expansion coefficients inside the $c^{j}_{lm}$ , these would take the form Bartók et al. (2013):

[TABLE]

where $\alpha=1/(2\sigma^{2})$ and all the quantities with subindex $j$ refer to the relative position of atom $j$ with respect to central atom $i$ . The modified spherical Bessel function of the first kind $i_{l}$ , that depends on $r$ , introduces the simultaneous $l$ and $n$ dependence of the coefficients, and at the same time makes their analytical derivation non-trivial (although still possible, see Refs. Jäger et al. (2018); Himanen et al. (2019)). All the details are given in the original SOAP paper Bartók et al. (2013).

To simplify this, we suggest to replace one problem by another. Let us express the atomic density in an approximate separable form:

[TABLE]

Conveniently, we will continue to use Gaussians:

[TABLE]

which would be the exact expression of a 3D Gaussian if $r$ was a regular Cartesian dimension and $r_{\perp,j}$ measured the distances from $\textbf{r}_{j}$ in the plane perpendicular to $r$ which contains $\textbf{r}_{j}$ . In our approximation, $r_{\perp,j}$ measures distances from $\textbf{r}_{j}$ in the spherical cap of radius $r_{j}$ . How well this spherical cap can be approximated by a plane depends on the ratio $r_{\perp,j}/r_{j}$ , which in practice depends on $\sigma/r_{j}$ , since $\sigma$ controls the decay length of our density as we move away from $\textbf{r}_{j}$ . The approximation improves as one moves further away from the origin. Therefore, in practice, we are not modeling our atomic density with Gaussians which are spherically-symmetric about $\textbf{r}_{j}$ , but about the origin. However, we must stress that this can also be understood as a choice, rather than an approximation, since in principle we have freedom in how we represent the atomic density, as long as permutational, translational and rotational invariances are preserved. An additional advantage of our approach is that we can choose different $\sigma$ s for the radial and angular channels, $\sigma_{r}$ and $\sigma_{\perp}$ , respectively. This further choice has the advantage that it reflects on the fact that length-preserving and angle-preserving interatomic interactions have different characteristic strengths. A final improvement in the choice of $\sigma$ is to incorporate a radial dependence, as already proposed in Ref. Willatt et al. (2019) and directly applicable to the original SOAP descriptor without the modifications incorporated here. This radial dependence allows for increasingly “blurry” atomic environments as one moves away from the center of the SOAP sphere. When done together with $r$ -dependent downscaling of the contribution of distant atomic neighbors to the density field, this strategy provides us with a flexible many-body kernel that is able to precisely encode the structural atomic information required for accurate interpolation of potential-energy surfaces.

We can approximate $r_{\perp,j}$ as

[TABLE]

where $\epsilon_{j}$ is the angle between r and $\textbf{r}_{j}$ . Equation (14) is equivalent to a second-order truncation of the Taylor expansion of the cosine, i.e., $\cos{\epsilon_{j}}\approx 1-\epsilon_{j}^{2}/2$ . With this approximation, the atomic density of atom $j$ is expressed as

[TABLE]

and the total density to be expanded is

[TABLE]

where we have explicitly introduced the amplitude $A_{j}$ to downscale the contribution of distant neighbors. In Eq. (15) we have notated the $\sigma$ s with $j$ subindices for the same reason. Radial downscaling has been discussed in more detail in Ref. Willatt et al. (2018). Casting the problem in an explicitly separable form means that expansion of the radius-dependent part of Eq. (15) becomes effectively a 1D problem. Therefore, the $r^{2}$ term that usually accompanies 3D integrals in spherical coordinates, which originates from the angular line elements, will not appear in our integrals for radial expansion [cf. Eq. (21) and next section]. The function $f(r;r_{j},r_{\text{cut}})$ can be any smoothing function that goes smoothly to zero at the cutoff. The most straightforward approach to downscaling distant neighbors is to introduce simple radial dependencies for these SOAP hyperparameters:

[TABLE]

where $\sigma_{0,r}$ and $\sigma_{0,\perp}$ define the Gaussian representation of the central atom in the SOAP sphere (although the value of $\sigma_{r,\perp}$ does not really have an effect on the representation of the central atom, since $r_{j}=0$ implies the angular Gaussian equals one always, cf. Eq. (14)), and $\alpha_{r},\alpha_{\perp},a\geq 0$ (when $\alpha_{r}=\alpha_{\perp}=a=0$ we retrieve the no downscaling limit). The third-order polynomial introduced in Eq. (18) is the simplest function that, for $a>0$ , goes smoothly from 1 at the origin to 0 at the cutoff. For $a\geq 1$ , its derivative at $r_{\text{cut}}$ is also smooth.

In Eq. (15), (1) is the radial part, (2) is a constant factor and (3) only depends on the angle between r and $\textbf{r}_{j}$ . Term (3) can be expressed as Kaufmann and Baumeister (1989):

[TABLE]

where $P_{l}$ is the Legendre polynomial. The addition theorem allows us to express this as Kaufmann and Baumeister (1989):

[TABLE]

We can obtain our radial expansion coefficients as

[TABLE]

where $r_{\text{cut}}$ is the SOAP sphere cutoff radius. With a polynomial basis (or a number of bases, for that matter) the $b$ s have analytical form, as shown in the next section. Our final coefficients are:

[TABLE]

Given that the $b$ s have an analytical form, all of these coefficients can be obtained analytically. Furthermore, even though the product $i_{l}(x^{2})\exp{\left(-x^{2}\right)}$ can be numerically unstable for large $x$ when the two factors are computed independently and then multiplied, when we compute the combined function the product is quite stable. For small $x$ the function is divergent but it can be computed as a limit using the Taylor expansion of $\exp{\left(-x^{2}\right)}$ .

The total coefficients are obtained by summing over atomic contributions:

[TABLE]

and, from them, the power spectrum is given by

[TABLE]

The $c_{nlm}$ are symmetric with respect to $m$ , because of the properties of the spherical harmonics: $c_{nl-m}=(-1)^{m}c_{nlm}^{*}$ . We can thus simplify the evaluation of $p_{nn^{\prime}l}$ as follows:

[TABLE]

which reduces the number of terms in the sum considerably, and also reduces the number of coefficients which need to be computed, since for $m<0$ the corresponding $c_{nlm}$ does not need to be computed (because it is not used). It also follows from Eq. (25) that the $p_{nn^{\prime}l}$ are symmetric upon exchange of $n$ and $n^{\prime}$ , which reduces the number of evaluations even further.

III Recursion formulas for the expansion coefficients

III.1 Radial expansion coefficients

The $b$ s can be computed analytically for certain bases. In the original SOAP paper Bartók et al. (2013), 3rd- and higher-order polynomials were proposed, even though they were not implemented in practice. Here we will use these polynomials since they allow us to infer recursion formulas for the radial expansion coefficients, as will be shown later. If our polynomial basis is $\{\phi_{\alpha}(r)\}$ , the orthonormal basis $\{g_{n}(r)\}$ is constructed as follows:

[TABLE]

with

[TABLE]

where $N_{\alpha}=\sqrt{r_{\text{cut}}/(2\alpha+5)}$ is a normalization factor. These polynomials, and their first and second derivatives, conveniently go to zero at the cutoff. The $W_{n\alpha}$ are obtained from the overlap matrix $S$ and need to be obtained only once for a given $n_{\text{max}}$ (they could even be tabulated):

[TABLE]

In practice, we use these polynomials for $\alpha=1,\dots,n_{\text{max}}-1$ and augment our basis set with a Gaussian function centered at the origin, which allows us to resolve the central atom in the atomic environment exactly:

[TABLE]

The normalization factor for this auxiliary Gaussian basis function assumes that $r_{\text{cut}}\gg\sigma_{0,r}$ .

When computing the radial expansion coefficients, we do not evaluate Eq. (21), i.e., the $b_{n}^{j}$ . Instead, we compute the overlap integrals for the $b_{\alpha}^{j}$ :

[TABLE]

We have left the $A_{j}f(r;r_{j},r_{\text{cut}})$ term [cf. Eq. (21)] out of Eq. (31) for simplicity, but without loss of generality since, as we will show later on, for our particular choice of smoothing function the functional form of the atomic density remains unchanged. From these $b_{\alpha}^{j}$ , the transformation is straightforward:

[TABLE]

The reason for working with the $b_{\alpha}^{j}$ is that the polynomial form of the $\{\phi_{\alpha}\}_{1}^{n_{\text{max}}-1}$ can be exploited to derive recursive relations. Consider the following integration by parts (where for clarity we have omitted the integration limits):

[TABLE]

We can manipulate the $r_{j}-r$ term in Eq. (33) as follows:

[TABLE]

so that, after collecting the terms, Eq. (33) reads as

[TABLE]

This recursion formula can be rewritten as

[TABLE]

While it would appear that we need $b_{-1}^{j}$ and $b_{0}^{j}$ to obtain the first coefficient, it can be shown that we can start the sequence by obtaining $b_{-1}^{j}$ from $b_{-2}^{j}$ assuming $b_{-3}^{j}=0$ . Therefore, we only need

[TABLE]

and the rest of the radial expansion coefficients are obtained from the recursion relation, Eq. (36).

The recursion formulas above are valid for $\alpha<n_{\text{max}}$ . The overlap integral between the atom’s Gaussian and the auxiliary Gaussian function $\phi_{n_{\text{max}}}(r)$ given by Eq. (30) is straightforward to derive:

[TABLE]

where $\sigma_{*}=\sqrt{\sigma_{0,r}^{2}+\sigma_{r,j}^{2}}$ . We have assumed above that the integrand, that is, the overlap between $\phi_{n_{\text{max}}}(r)$ and $\rho_{j}(r)$ , has effectively decayed to zero at the cutoff.

Finally, we must remark that with our choice of radial basis, linear dependencies in the overlap matrix $S$ develop for $n_{\text{max}}>12$ , i.e., a singular-value decomposition of $S$ yields some very small eigenvalues. Therefore, our implementation becomes unstable for $n_{\text{max}}>12$ .

III.2 Density smoothing at the cutoff

To ensure smoothness of interpolated potential energy surfaces and any general ML model based on SOAP, it is vital to remove sharp discontinuities of the kernel functions when atoms move in and out of the cutoff sphere Bartók et al. (2013). In our present implementation, we rely on a soft cutoff, a hard cutoff and a “buffer zone” in between. The hard cutoff delimits the sphere within which the SOAP descriptor “sees” neighboring atoms: any atom outside of the hard cutoff will be completely neglected. The soft cutoff delimits the sphere within which the atomic densities are represented fully, as per the expressions given above. The buffer zone is the region between soft and hard cutoff within which the atomic densities are smoothed out to zero. Therefore, a suitable smoothing function is defined as:

[TABLE]

and a smooth transition from 1 to 0 in all other cases. Note that this smoothing affects the density field; we have already introduced another smoothing function in Eq. (18) to downscale the heights of the Gaussians. We choose the following convenient definition for our density smoothing function:

[TABLE]

The characteristic decay length is selected by choosing a suitable filtering parameter $n_{\text{f}}$ , such that, numerically, the exponential is approximately zero at $r_{\text{hard}}$ . For instance, $n_{\text{f}}=4$ already brings the smoothing function down to $\sim 3.35\times 10^{-4}$ at the hard cutoff, regardless of the actual choice of cutoffs. Filtering parameters equal and larger than 4 are suitable choices, noting that choosing a very large number actually defeats the purpose of using a buffer zone. Therefore our implementation uses $n_{\text{f}}=4$ as default. The motivation for using a Gaussian as smoothing function is simple: since the product of two Gaussians (the atomic density and the smoothing function) is also a Gaussian, we can use the same recursion relations derived in the previous section, choosing the integration limits appropriately. That is, the expansion is divided into the $[0,r_{\text{soft}}]$ and $[r_{\text{soft}},r_{\text{hard}}]$ domains. Within the first domain, we expand $\rho_{j}(r)$ , whereas within the second domain we expand $f(r)\rho_{j}(r)$ , where $f(r)\rho_{j}(r)$ is also a Gaussian. In both cases, the overlap integrals are scaled by the downscaling factor introduced in Eq. (18). Figure 2 shows an example of how the smoothing procedure outlined above works in practice.

III.3 Angular expansion coefficients

Compared to the radial expansion coefficients, obtaining the angular part of Eq. (22), that is everything that depends on $l$ and/or $m$ , may seem trivial. However, there is still a number of simplifications that can be implemented in order to optimize computational performance. We start out with the product of the exponential and Bessel functions in that equation, for which we define the ilexp function:

[TABLE]

Based on the recursion relation for the modified spherical Bessel function of the first kind $i_{l}$ Arfken and Weber (2005), we can derive the following recursion relation for $\text{ilexp}(x;l)$ :

[TABLE]

for which we need the first two functions to start the sequence:

[TABLE]

For $x$ close to zero, we use the Taylor expansion of the exponential part to avoid the singularity:

[TABLE]

These expressions and recursion relations allow us to, computationally, obtain all the $\text{ilexp}(x;l)$ functions, from $l=0$ to $l=l_{\text{max}}$ , for the same cost of obtaining $\text{ilexp}(x;l_{\text{max}})$ .

The next step for the angular expansion is to optimize the evaluation of the (complex conjugate) of the spherical harmonics $Y_{lm}^{*}(\theta_{j},\phi_{j})$ :

[TABLE]

Computationally, evaluating this equation can be divided into three tasks: i) computing the prefactors, ii) computing the complex exponentials and iii) computing the Legendre polynomials. The first simplification is to obtain only the terms for which $m\geq 0$ , cf. Eq. (25). After that, a second simplification is that all of these three tasks can be expressed as a recursion series. The calculation of the factorial terms is trivially recursive, by varying $m$ for fixed $l$ . The calculation of the complex exponential is, perhaps surprisingly, rather expensive computationally if implemented naively:

[TABLE]

where we have used Euler’s formula. Modern compilers will take a significant amount of time evaluating these trigonometric functions. Instead, we can use Chebyshev’s recursion formula to considerably speed up this evaluation:

[TABLE]

where we only need to call the compiler’s implementation of the intrinsic functions $\cos$ and $\sin$ twice: once for $\cos(-\phi)=\cos\phi$ and once for $\sin(-\phi)=-\sin\phi$ . All the other function calls, up to $m=l_{\text{max}}$ , are to sums and multiplications, which are significantly faster.

Finally, the calculation of associated Legendre polynomials can also be cast as a (rather more complicated) recursion. We need the following six polynomials to initialize the recursion series:

[TABLE]

With these, we first need to obtain $P_{l0}$ and $P_{l1}$ with a recursion formula on $l$ :

[TABLE]

From those, we now get all the $P_{lm}$ for greater $m$ using a recursion formula on $m$ :

[TABLE]

For values of $x$ very close to $\pm 1$ these recursion formulas diverge, even though the actual $P_{lm}$ are finite. In that case we simply set all the $P_{lm}$ , for $l\geq 3$ and $m\geq 2$ , to zero. In our current implementation we establish the condition $||x|-1|<10^{-5}$ to consider $x$ to be “very close” to $\pm 1$ .

III.4 Speed

We tested our new implementation, written in Fortran, for speedup with respect to the implementation available from the QUIP code (also written in Fortran) ref . QUIP is linked through the interface available from Quippy. Since not only the algorithms to compute the SOAP descriptors but also the software implementations differ (even though they both use the same language), we attempt as fair a comparison as possible by only timing the time it takes both codes to carry out the atomic density expansion and construction of the SOAP vectors, that is, excluding extraneous operations like nearest-neighbor list builds and such. Our test system is an atomic structure made out of 10000 atoms randomly placed within a cubic box of side length $L=46.371$ Å and constrained to be not closer than 0.7 Å from one another.

For Quippy, we can only get the overall execution time for density expansion plus SOAP vector construction. For our implementation, we can get the timing for each of the three steps individually: radial expansion, angular expansion and vector construction. We tried our implementation built with two different compilers: the proprietary Intel compiler (“ifort”) and the free “gfortran” compiler. Although ifort achieved significantly better performance than gfortran with “aggressive” optimization flags (circa 40% better) this led to numerical instabilities. With numerically-safe optimization the ifort binary improves gfortran’s binary by approximately 20%. We only report timings for the ifort-built binary here. It is possible that by adapting the current code to prevent numerical instabilities better performance can be achieved with ifort or other proprietary compilers. However, we have not exhaustively attempted this for the present work. The results of our tests, for all the basis sets that can be constructed by combinations of $3\leq n_{\text{max}}\leq 12$ and $3\leq l_{\text{max}}\leq 12$ , are given in Fig. 3.

We start our discussion of performance by stating that basis set sizes typically used to train and evaluate accurate GAPs range in the $8\leq n_{\text{max}}\leq 12$ and $8\leq l_{\text{max}}\leq 12$ intervals Deringer and Csányi (2017); Deringer et al. (2018). We should therefore keep these ranges in mind when establishing the speedup factors that will be achieved with the new implementation in practical applications. First, we look at timings for the individual steps, i.e., the top row in Fig. 3. Remarkably, we can compute all the SOAP descriptors for our 10000 atoms, with accurate settings on a single-core CPU, in under one second. Even though computing the radial expansion coefficients in the small basis set region is the computational bottleneck, the increasing cost of adding more radial basis functions is quite modest thanks to the recursion relations, and the angular expansion (whose number of coefficients grows quadratically with $l_{\text{max}}$ ) becomes the bottleneck as the size of the basis grows further. Interestingly, the cost of SOAP vector construction, that is the multiplication and summation operations on the individual expansion coefficients that lead to the final SOAP vectors, takes up a significant fraction of the CPU time in the region of highly accurate representation (middle row in Fig. 3). In the most typical region of interest, $n_{\text{max}}=10$ and $l_{\text{max}}=10$ , each task takes about one third of the execution time. Therefore, there is no obvious computational bottleneck in our current implementation.

The most important panel in Fig. 3 is the lower one, where a comparison with the existing QUIP-SOAP implementation is presented. In our regions of interest, the speedup that we can achieve is approximately tenfold. As a matter of fact, the practical speedup is even higher because, as we will show in the next section, the new descriptor is also more accurate and better able to capture the chemistry of atomic environments. This means that the size of the basis required to achieve the same model performance with the new SOAP descriptor is actually smaller than with the old SOAP descriptor.

IV GAP model performance

Although the main objective of this paper is to improve the computational efficiency of SOAP calculations, we need to ensure that the introduced modifications to both atomic density representation and its basis expansion do not degrade the performance of ML models based on SOAP kernels. In other words, we need to ensure that the accuracy that can be obtained by a ML model of interatomic interactions, in terms of average error per atom, that employs the new SOAP is at least as low as what can be achieved with the original descriptor. We start by training an interatomic potential for amorphous carbon, followed by an adsorption energy model for the same material. In both cases we generate the model within the GAP framework, which is the typical ML-based interatomic potential framework that we expect will make use of the new SOAP. We also discuss in some detail what is the role of hyperparameters on model accuracy and how to optimize their values.

IV.1 Cohesive energy model

We trained a “cohesive energy” GAP model (i.e., a regular interatomic potential) for amorphous carbon (a-C) using the database from Deringer and Csányi Deringer and Csányi (2017). For the purposes of the current benchmark, our GAP model incorporates only SOAP descriptors, unlike the original a-C GAP that incorporates also two- and three-body descriptors. Briefly, within the GAP formalism, an interpolated local atomic energy for environment $i$ , $\bar{\epsilon}_{i}$ , is given by a linear combination of kernel functions:

[TABLE]

where the sum runs over environments in the “sparse” set (a subset of representative atomic environments in the training set). The fitting coefficients, $\alpha_{s}$ , are precomputed during the training stage. The number of coefficients, i.e., the “size” of the ML model, depends on the number of sparse configurations $N_{s}$ , and the cost of evaluating the model grows linearly with this number. The total number of configurations in the training set $N_{t}$ is much larger than $N_{s}$ : while all these $N_{t}$ configurations are used in deriving the $\alpha_{s}$ during the training stage, only $N_{s}$ configurations are used during production calculations. A practical guide to train GAPs, including notes on sparsification of the training set, is given in Ref. Bartók and Csányi (2015). Here, we focus only on the effect of sparse set size on the performance of GAP models trained using the new SOAP versus the old SOAP. The a-C database used contains approximately 4k supercells of different sizes with a total of 170k unique local atomic environments, and three times as many forces. Our tests consist of models trained with $N_{s}$ values between 100 and 1000, and we trained 10 different models for each value of $N_{s}$ , where the local environments in the sparse set were chosen randomly. For training, we can use all $N_{t}$ local environments and add the corresponding $3N_{t}$ forces. Currently, our new SOAP implementation is lacking kernel derivatives and, because of this, we have only trained old SOAP-based models including forces (this capability is available through QUIP). We have tested the model performance, computed as the root-mean square error (RMSE) per atom, on a set of 50 different 64-atom a-C structures that were not included in the training set.

The results of our test are given in Fig. 4. For the first three models (“old SOAP” with and without forces and “new SOAP”) we choose $n_{\text{max}}=l_{\text{max}}=8$ , $\sigma_{0,r}=\sigma_{0,\perp}=0.5$ Å, $\zeta=4$ and $r_{\text{cut}}=4.5$ Å, very similar to the parameters used in Ref. Deringer and Csányi (2017) to fit the original a-C GAP, and switch off the extra hyperparameters in the new SOAP that are not available from the old one, to ensure a fair comparison. For the last model, we train a “new SOAP” with optimized hyperparameters, as discussed in more detail in the next section. Namely, we choose $n_{\text{max}}=l_{\text{max}}=8$ , $\sigma_{0,r}=0.2$ Å, $\sigma_{0,\perp}=0.4$ Å, $\alpha_{r}=0.08$ , $\alpha_{\perp}=0.08$ , $a=1$ , $\zeta=3$ and $r_{\text{cut}}=4.5$ Å.

We observe that, with similar sets of hyperparameters, the new SOAP allows us to train an a-C GAP that is between 10% and 30% more accurate than with the old SOAP, depending on the number of sparse configurations used. Even when forces are added to the fit, the new SOAP (without forces in the training set) is as accurate as the old SOAP, and more accurate for small sparse sets. We also note a worsening of the fit for the old SOAP (without forces) as the number of sparse samples increases. It is possible that, as the number of configurations in the sparse set is increased beyond the optimum, the extra configurations result in added data noise, which worsens the fit. This noise can be party removed by adding local information (forces) to the fit and/or using a less noisy structural kernel (new SOAP). Another strategy to mitigate this problem is to tune the regularization parameter, by increasing it as the number of sparse configurations go up. The small symbol at $n_{\text{sparse}}=1000$ shows that a regularization parameter twice as large as the regular one seems to improve the fit in this case.

We conclude that, when used in combination with optimized hyperparameters, using the new SOAP allows us to reduce the error in the fit by nearly half. We attribute most of this improvement to two factors, one numerical and one physical. From the numerical perspective, the new SOAP implementation uses improved radial basis functions which are better at resolving narrower atom Gaussians and work better for longer cutoff radii. The improvement of the underlying physical model stems from the freedom to choose radial sigmas and angular sigmas independently. This reflects on the fact that interatomic interactions (“force constants” in the context of empirical force fields) have different characteristic strengths in the angular and radial directions. The separable form of the new SOAP descriptor allows us to incorporate this physically-motivated effect into the mathematical representation of the atomic environments.

IV.2 Adsorption energy model and role of hyperparameter on model performance

Training cohesive energy GAP models is computationally expensive, because of the amount of data involved in the training necessary to obtain a reasonable fit. Therefore, a systematic assessment of model performance versus choice of hyperparameters (HPs) is impractical over wide regions of HP space. By contrast, an adsorption energy GAP model is cheap to train. We introduced such a model for hydrogen adsorption on a-C in our previous work Caro et al. (2018b) and explored the idea of HP optimization via Monte Carlo sampling of HP space. While this optimization method is quite expensive compared to, e.g., Bayesian optimization, it allows us to “explore” wide regions of HP space and get a glimpse of how different parameters affect model performance. We retrained a large number of adsorption energy models (hundreds of thousands) using the data from Ref. Caro et al. (2018b) with our new SOAP descriptor, which allows us to reconstruct the convex hull for model performance versus HP choice, as shown in Fig. 5. In the figure we can observe how some parameters have a modest impact on the model performance, such as the regularization term $\sigma$ , while others have a very pronounced effect, e.g., the cutoff radius $r_{\text{cut}}$ . Compared to the same analysis that we carried out for the same data using the old SOAP descriptor Caro et al. (2018b), we observe that the new SOAP allows us to use the information of more distant neighbors to improve the accuracy of the model. That is, the performance of the new SOAP does not degrade significantly as $r_{\text{cut}}$ is increased beyond the optimum. At the same time, the data on Fig. 5 clearly show that the optimal values for $\sigma_{r}$ and $\sigma_{\perp}$ are different. The choice of HPs that allowed to obtain a dramatic improvement in cohesive energy GAP accuracy, shown in the previous section in Fig. 4, was informed from the results obtained for an adsorption energy GAP shown in Fig. 5. This clearly hints towards the idea of HP transferability across different ML models that feed on the same kind of atomic information.

IV.3 Hyperparameter optimization for interatomic potentials

As we have just shown, the particular choice of HPs can dramatically affect the performance of a ML model for interatomic interactions. While the adsorption energy model presented in Sec. IV.2 is computationally cheap to train, training a single GAP interatomic potential with typical database sizes, usual sparsification and including forces in the fit can take up to a few hundred CPU hours. Stochastic evaluation of an error-based objective function, such as the RMSE, for different combinations of HPs, can thus become a huge computational task.

Comparing to the original SOAP formulation, the SOAP-like descriptor introduced in this paper incorporates new HPs that make optimization, usually reliant on heuristics, even more difficult. Therefore, finding efficient ways to obtain combinations of HPs optimally suited to the problem at hand becomes necessary. A promising route towards improving GAP accuracy via tuning of HPs is Bayesian optimization Snoek et al. (2012). Bayesian optimization models the objective function as a sample from a Gaussian process compatible with the current set of observations. In our case, an observation is an evaluation of the RMSE of a GAP for a particular combination of HPs. Employing Bayesian optimization we can then predict: 1) the minimum of the RMSE in the search space of HPs and 2) where (in HP space) to acquire new observations so as to optimally improve the prediction for the minimum. Setting up a Bayesian optimizer is not necessarily a straightforward task, since such a model comes with its own set of HPs. In addition, the selection of HP combinations for acquiring new data is not done in an agnostic way but influenced by where previous data have been acquired. Therefore, an efficient parallelization strategy is not necessarily straightforward either. Using Bayesian optimization to improve the accuracy of ML interatomic potentials is an active area of research, and we are currently undertaking efforts in this direction. However, these sophisticated optimization strategies fall outside of the scope of the present manuscript.

Here, to optimize a GAP interatomic potential, we propose an intermediate solution between Bayesian optimization and the random search carried out for our adsorption energy model in Sec. IV.2: Sobol sequences Sobol (1976). A Sobol sequence is a series of points in an $N$ -dimensional hypercube, where the points are chosen in sequence so as to “close holes” in this space. These points fill space more homogeneously than randomly-chosen or grid samples, but still without any knowledge of the objective function (therefore the Sobol sequence is the same irrespective of which function is being sampled). A Sobol-sequence-based brute-force search for optimal HPs is as trivial to parallelize as the random search from the previous section. In Fig. 6 we show the results of this analysis for a series of a-C GAP models trained from the same database used in Sec. IV.1. To have comparable results with respect to computational effort, we did not vary $n_{\text{max}}$ and $l_{\text{max}}$ in this search for optimal HPs, sticking to $n_{\text{max}}=l_{\text{max}}=8$ .

For the search in HP space, we used 1000 Sobol vectors *[][.Forthesoftwareimplementation; seegithub.com/naught101/sobol_seqandpeople.sc.fsu.edu/~jburkardt/py_src/sobol/sobol.html.]sobol_1976. Each of the 1000 unique combinations of HPs was used to train 10 different GAP models. Each of these 10 models is built with 100 sparse-set configurations, randomly chosen from the full training set. Figure 6 (a) reports the configurational average RMSE and error bars (standard deviation of the RMSE) as a function of HP. For these cohesive energy GAPs, model performance as a function of HPs resembles our observations from Fig. 5 for the adsorption energy GAPs. In this case, optimal parameters are found around $\sigma_{0,r}=0.12$ Å, $\sigma_{0,\perp}=0.55$ Å, $\alpha_{r}=0.14$ , $\alpha_{\perp}=0.06$ , $a=1.6$ , $\zeta=3$ and $r_{\text{cut}}=4.3$ Å. In Fig. 6 (b) we use these HPs to repeat the same analysis we carried out in Sec. IV.1. Our main observation is that, while optimizing HPs directly on the same type of model (cohesive energy) for which these HPs are intended improves model performance, it does so only marginally. The Sobol-based HP optimization based on reducing the RMSE for a cohesive energy GAP produces optimal HPs which are similar to those obtained from an adsorption energy GAP, and lead to final model improvement by $\sim 10\%$ . Again, we highlight the idea that HPs are possibly transferable between ML models that predict different properties, but that feed on the same type of structural data (namely, the atomic structure as encoded via the SOAP descriptors). This raises the question of whether it is generally possible to use a computationally cheap surrogate model (in our case the adsorption energy GAP) to optimize HPs for more computationally demanding ML models (e.g., an interatomic potential or cohesive energy GAP). Based on the observations presented in this paper, we speculate that this may indeed be the case, although further research is required in this direction.

V Conclusions and outlook

In this paper we have presented a modified form of the many-body atomic descriptor known as SOAP Bartók et al. (2013) and a series of mathematical and computational recipes for its efficient evaluation. This type of descriptor is routinely used as essential input for novel ML-based Gaussian approximation potentials Bartók et al. (2010); Bartók and Csányi (2015) and other ML models used to understand and predict the properties of solids and molecules Caro et al. (2018b); De et al. (2016); Jäger et al. (2018); Himanen et al. (2019). While the primary objective of this work was to improve the computational efficiency of SOAP calculations, the new formulation also allows for a significant boost in accuracy. All in all, we expect that, at fixed accuracy, the total speedup for practical applications will be between a factor of 10 and a factor of 20. This is a remarkable number because it means that ML-based atomistic simulations can potentially be made one order of magnitude cheaper, bringing us one (big) step closer to the realm of empirical interatomic potentials. ML-based simulations of systems comprising up to one million atoms in the simulation box may now, or in the very near future, become within reach.

Acknowledgements.

The author is thankful to Volker Deringer and Gábor Csányi, from the University of Cambridge, Albert Bartók, from the Science and Technology Facilities Council, and Aki Morooka, formerly at Aalto University, for very helpful discussions on SOAP descriptors and GAP models. We thank A. Bartók again for porting our implementation of the SOAP-like descriptor presented here into the QUIP code (available as “SOAP express”). The author also thanks Dorothea Golze from Aalto University for her insights into basis set generation and code optimization. Funding from the Academy of Finland under project 310574, computational resources from CSC – IT Center for Science and travel support from the HPC-Europa3 program are gratefully acknowledged.

Bibliography20

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Caro et al. (2018 a) M. A. Caro, V. L. Deringer, J. Koskinen, T. Laurila, and G. Csányi, “Growth mechanism and origin of high s p 3 𝑠 superscript 𝑝 3 sp^{3} content in tetrahedral amorphous carbon,” Phys. Rev. Lett. 120 , 166101 (2018 a).
2Behler and Parrinello (2007) J. Behler and M. Parrinello, “Generalized neural-network representation of high-dimensional potential-energy surfaces,” Phys. Rev. Lett. 98 , 146401 (2007).
3Bartók et al. (2010) A. P. Bartók, M. C. Payne, R. Kondor, and G. Csányi, “Gaussian approximation potentials: The accuracy of quantum mechanics, without the electrons,” Phys. Rev. Lett. 104 , 136403 (2010).
4Behler (2011) J. Behler, “Atom-centered symmetry functions for constructing high-dimensional neural network potentials,” J. Chem. Phys. 134 , 074106 (2011).
5Huo and Rupp (2017) H. Huo and M. Rupp, “Unified representation of molecules and crystals for machine learning,” ar Xiv:1704.06439 (2017).
6Bartók et al. (2013) A. P. Bartók, R. Kondor, and G. Csányi, “On representing chemical environments,” Phys. Rev. B 87 , 184115 (2013).
7Deringer and Csányi (2017) V. L. Deringer and G. Csányi, “Machine learning based interatomic potential for amorphous carbon,” Phys. Rev. B 95 , 094203 (2017).
8Bartók and Csányi (2015) A.P. Bartók and G. Csányi, “Gaussian approximation potentials: A brief tutorial introduction,” Int. J. Quantum Chem. 115 , 1051 (2015).