A Novel Approach to Describe Chemical Environments in High Dimensional Neural Network Potentials
Emir Kocer, Jeremy K. Mason, Hakan Erturk

TL;DR
This paper introduces a new set of descriptors for atomic environments in neural network potentials, improving the accuracy of molecular dynamics simulations of silicon by outperforming existing descriptors.
Contribution
It proposes a novel, invariant, orthogonal, and differentiable descriptor set for atomic environments, enhancing neural network potential performance.
Findings
Descriptors outperform Behler-Parinello and SOAP in tests
Neural networks with new descriptors achieve higher accuracy
Improved modeling of atomic interactions in silicon
Abstract
A central concern of molecular dynamics simulations are the potential energy surfaces that govern atomic interactions. These hypersurfaces define the potential energy of the system, and have generally been calculated using either predefined analytical formulas (classical) or quantum mechanical simulations (ab initio). The former can accurately reproduce only a selection of material properties, whereas the latter is restricted to short simulation times and small systems. Machine learning potentials have recently emerged as a third approach to model atomic interactions, and are purported to offer the accuracy of ab initio simulations with the speed of classical potentials. However, the performance of machine learning potentials depends crucially on the description of a local atomic environment. A set of invariant, orthogonal and differentiable descriptors for an atomic environment is…
| T [K] | NN | RMSE [meV] | ||
|---|---|---|---|---|
| 300 | 6.03 | 16-8-1 | 16 | 0.22 |
| 300 | 6.03 | 16-16-1 | 16 | 0.23 |
| 300 | 6.03 | 25-8-1 | 25 | 0.35 |
| 600 | 6.96 | 16-8-1 | 16 | 0.56 |
| 600 | 6.96 | 16-16-1 | 16 | 0.64 |
| 600 | 6.96 | 25-8-1 | 25 | 0.51 |
| 1000 | 7.63 | 16-8-1 | 16 | 1.24 |
| 1000 | 7.63 | 16-16-1 | 16 | 1.98 |
| 1000 | 7.63 | 25-8-1 | 25 | 0.88 |
| 1500 | 7.92 | 16-8-1 | 16 | 2.62 |
| 1500 | 7.92 | 16-16-1 | 16 | 2.75 |
| 1500 | 7.92 | 25-8-1 | 25 | 2.3 |
| K [GPa] | G [GPa] | ||
|---|---|---|---|
| SW Stillinger and Weber (1985) | 101.4 | 56.4 | 0.335 |
| NNP | 101.7 | 56.3 | 0.337 |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
A NOVEL APPROACH TO DESCRIBE CHEMICAL ENVIRONMENTS IN HIGH-DIMENSIONAL NEURAL NETWORK POTENTIALS
Emir Kocer
Department of Mechanical Engineering, Bogazici University, Istanbul, TURKEY
Jeremy K. Mason
Department of Materials Science and Engineering, University of California Davis, CA, USA
Hakan Erturk
Department of Mechanical Engineering, Bogazici University, Istanbul, TURKEY
Abstract
A central concern of molecular dynamics simulations are the potential energy surfaces that govern atomic interactions. These hypersurfaces define the potential energy of the system, and have generally been calculated using either pre-defined analytical formulas (classical) or quantum mechanical simulations (ab initio). The former can accurately reproduce only a selection of material properties, whereas the latter is restricted to short simulation times and small systems. Machine learning potentials have recently emerged as a third approach to model atomic interactions, and are purported to offer the accuracy of ab initio simulations with the speed of classical potentials. However, the performance of machine learning potentials depends crucially on the description of a local atomic environment. A set of invariant, orthogonal and differentiable descriptors for an atomic environment is proposed, implemented in a neural network potential for solid-state silicon, and tested in molecular dynamics simulations. Neural networks using the proposed descriptors are found to outperform ones using the Behler–Parinello and SOAP descriptors currently in the literature.
††preprint:
I Introduction
Molecular dynamics (MD) simulations are frequently used in computational materials science to study the behaviour of both molecular and bulk systems. These simulations assume that the energy of an atom can be defined as a function of the local atomic environment, and the way this is done can dramatically affect the accuracy and performance of the simulation. The two main approaches in the literature are to calculate the atomic energies and forces using electronic structure calculations Hautier et al. (2012), resulting in ab initio MD, or to pre-define functions describing the atomic interactions Subramanian (2009), resulting in classical MD. Perhaps the most popular electronic structure method in the literature is density functional theory (DFT) Burke (2012) due to its relative accuracy for condensed matter states. However, the accuracy provided by DFT has a tremendous computational cost that strongly restricts the time and length scales of the simulation. In contrast to ab initio methods, the potentials used in classical MD are generally many orders of magnitude faster to execute. This makes them suitable for longer simulations containing many millions of atoms, allowing the study of more complex phenomena in larger domains. The drawback of such potentials is that they contain a limited number of fitting parameters that are generally calibrated to reproduce the properties of a particular bulk phase, resulting in inaccuracies when simulating complex phenomena such as phase transitions Binks and Grimes (1993), dislocations Botu et al. (2016), and interfacial dynamics Süle and Szendrő (2014).
Recent interest in machine learning (ML) has encouraged the development of machine learning potentials (MLPs) with the goal of achieving quantum mechanical accuracy while approaching the speed of analytical potentials Behler and Parrinello (2007); Bartók et al. (2010); Li et al. (2015); Deringer and Csányi (2017); Ceriotti et al. (2018); Bose et al. (2018). These effectively apply non-parametric function regression to some reference data set to interpolate the potential energy surface (PES) of a local chemical environment. After a training process, the MLP is able to predict the energy of and force on a central atom from a description of the atomic neighborhood. Since ML algorithms can in principle reproduce even subtle many-body relationships, they provide higher flexibility than empirical potentials with a fixed functional form. Moreover, once they are trained on data collected by high-accuracy DFT simulations of a variety of configurations and phases, they can (given suitable coverage of the training data) maintain comparable accuracy during MD simulations with less computational expense than ab initio MD; they have been reported to be up to five orders of magnitude faster than quantum mechanical simulations with comparable accuracy Behler (2011); Snyder et al. (2012); Smith et al. (2017).
Typical reference data to train a MLP consists of a central atom’s local chemical information (relative positions and species of neighboring atoms) and potential energy. Similar to analytical potentials, the potential energy is assumed to depend only on the environment within some cutoff radius, and neighbors outside this volume are ignored. This atom-centered approach was initially proposed by Behler–Parinello Behler and Parrinello (2007), and enables one to calculate the total energy of a given system by summing over all the atoms. The preparation of the training data is crucial for an accurate representation of the PES, and DFT simulations are usually employed to calculate target energies and forces with high accuracy. Considering the cost of these DFT simulations, preparation of the training data is the most computationally demanding part of MLP development.
There are two key steps in the construction of a suitable reference set. First, since ML algorithms do not extrapolate as well as they interpolate, the space of local atomic environments should be widely sampled to increase the transferability of the potential. Several procedures have been proposed to construct the set of reference points used in training. Pukrittayakamee et al. Pukrittayakamee et al. (2009) used an importance sampling technique that selects training configurations based on atomic accelerations in MD simulations. This increases the sampling frequency where the potential energy gradient is large, i.e., where the PES is rapidly changing and could otherwise be sparsely sampled. Behler et al. Behler et al. (2008) attempted to equitably sample different regions of the configuration space by constructing a training set that included a mixture of crystal structures with different lattice parameters, amorphous structures, and some structures derived from metadynamics simulations. Another sampling technique that increases both the validity and accuracy of neural network potentials (NNPs) is extending the training set iteratively in a self-consistent way by detecting regions on the PES where the NNP performs poorly. Raff et al. Raff et al. (2005) employed a primitive NNP in MD simulations to produce new trajectories. Energies associated with these new trajectories were then calculated with an ab initio method. Configurations from trajectories where contradictions occured were added to the reference set, and a new NNP was trained. The procedure was initialized with an empirical potential to obtain the first target energies, but the overall method was shown to be independent of the initial potential. They also claimed that most of the points in the configuration space are redundant and only a small subset of possible configurations needs to be sampled, and devised a novelty sampling algorithm to compute a set of possible trajectories in MD simulations. Behler suggested that multiple NNs could be used to find poorly represented regions on the PES by identifying regions where they conflicted, and appending these configurations to the training set Behler (2011). Both iterative approaches were found to enhance the performance of a given NNP, and can be employed in conjunction.
The second major requirement for an MLP is some description of the local chemical environment as a set of real-valued numbers known as descriptors. Machine learning algorithms are unaware of the physical properties of the data by design, and training can be dramatically simplified by appropriate pre-conditioning of the inputs. For an MLP, the description of the local chemical environment should be invariant with respect to fundamental physical symmetries including translations and rotations of the coordinate system and permutations of the atomic labels. If invariance to these symmetries is not enforced during the construction of the descriptors, the MLP could predict that physically identical configurations have different energies. Let be the relative positions of the neighbors around the th atom. These are usually converted into a vector of real-valued numbers that are invariant to the physical symmetries. A recent review of MLPs and local structural descriptors by Behler Behler (2016) included an overview of the Behler–Parinello (BP) symmetry functions Behler and Parrinello (2007), one of the first sets of descriptors proposed and widely used in the literature Behler et al. (2008); Artrith and Behler (2012); Morawietz et al. (2016); Natarajan and Behler (2016). Separately, Bartók et al. Bartók et al. (2013) reviewed several descriptors commonly used in the literature, proposed the SOAP descriptors to represent atomic environments, and quantitatively compared several variations using ad-hoc tests. The SOAP descriptors have increasingly been used in the literature, both within Deringer and Csányi (2017); Dragoni et al. (2018) and without De et al. (2016); Rosenbrock et al. (2017) the context of MLPs.
Since the literature on MLPs is relatively immature, there remains the possibility that local structural desciptors could be further improved. This study proposes a set of local structural descriptors that are found to contain considerably more information than the BP descriptors, and to be considerably more efficient to evaluate than the SOAP descriptors. The proposed descriptors were integrated into a NNP for solid-state silicon which was implemented as a new pair-style for LAMMPS Plimpton (1995) and validated in MD simulations. Since the main subject of this study is the descriptors rather than the potential, the energies of configurations in the reference set were calculated by means of an empirical potential Stillinger and Weber (1985) to reduce the computational cost. These would usually have been calculated with ab initio methods to achieve higher accuracy, but at the price of more uncertainty in the systematic error. Finally, the performance of the NNP using the proposed descriptors was compared with that of comparable NNPs using the BP and SOAP descriptors.
II Method
II.1 Descriptors
The faithfulness of any MLP depends strongly on how accurately the local structural descriptors describe the atomic neighborhood. A robust description would ideally provide a one-to-one mapping (bijection) between atomic positions and descriptors up to the symmetries of the physical system. This section introduces a new set of local structural descriptors that are continuous, twice-differentiable and invariant to the physical symmetries identified in Section I.
Many steps in the construction resemble those for the SOAP descriptors Bartók et al. (2013). The first step is to map the list of relative atomic coordinates to a neighbor density function
[TABLE]
for a central atom , thereby handling any permutation symmetries. The summation is performed over all neighbors within a spherical region defined by the cutoff radius , which realizes the physical assumption that atomic energies should depend only on the local environment. Since all configurations are atom-centered, the neighbor position vectors are defined relative to the central atom. The weight factor could be used to distinguish the th species in a multi-component system, but for simplicity is set to one and the superscript on is dropped in the following.
The second step is to project onto a set of orthonormal basis functions on the ball of radius . Similar to Bartók et al. Bartók et al. (2013), this projection is carried out by expanding as
[TABLE]
where is a radial basis function, is a spherical harmonic, and and are hyperparameters specifying the respective radial and angular resolutions. Although orthogonal radial basis functions should be preferred to minimize redundant information, Bartók et al. Bartók et al. (2013) neglected the appropriate weight factor for the spherical coordinate system and did not select orthogonal radial basis functions for their SO(3) and bispectrum descriptors, perhaps explaining the poor performance of these descriptors in their numerical experiments. Subsequent publications involving the SOAP descriptors Szlachta et al. (2014) do use orthogonal radial basis functions, but this point is discussed further in Section III.2. Apart from orthogonality, the radial basis functions should be defined to have vanishing values and first and second derivatives at the cutoff to ensure continuity of forces and elastic properties Zhou and Jones (2011). Motivated by these requirements, we propose a set of radial basis functions constructed from linear combinations of the spherical Bessel functions.
Let be the linear combination of spherical Bessel functions
[TABLE]
where and are constants, is the th spherical Bessel function of the first kind, is the th root of , and is the cutoff. Since by definition, the objective is to find and such that and . Combining the two differentiation rules for spherical Bessel functions in the Supplementary Material (SM) and solving for the roots of first and second derivatives indicates that both conditions can be satisfied if the coefficients in Eq. 2 satisfy
[TABLE]
for an arbitrary multiplicative constant . The value of is fixed by requiring that the be normalized with respect to the inner product, leading to
[TABLE]
as an explicit equation for the . A set of orthonormal radial basis functions can then be defined by applying the Gram-Schmidt process to the for , with details provided in the SM.
Observing that and , the evaluation of the radial basis functions simplifies considerably for the case. The equation for reduces to
[TABLE]
The radial basis functions can then be defined by the recursion relations
[TABLE]
initialized with and . By construction, they satisfy the orthonormality condition
[TABLE]
appropriate for functions on the ball of radius . Several examples of the and their first derivatives are shown in Fig. 1.
With a suitable set of orthonormal basis functions defined on the ball of radius , the expansion coefficients in Eq. 1 can be written in terms of the relative spherical coordinates of the neighbors of the th atom:
[TABLE]
While the depend on the orientation of the coordinate system, the power spectrum obtained from
[TABLE]
is rotationally invariant Bartók et al. (2013). We therefore propose to use the real-valued as local structural descriptors for neural networks. The number of descriptors is , and the accuracy of the expansion in Eq. 1 increases with and . That is, larger values of and include more terms in the approximation and more precisely specify the local environment. On the other hand, increasing the number of descriptors increases the cost of evaluating the NNP. While a local environment with neighboring atoms requires precisely descriptors to describe the relative positions of all the atoms, we observe that more descriptors are often required in practice.
II.2 Neural Network Potential
Artificial Neural Networks (ANNs) have experienced a significant surge of interest in the last two decades after their success in various classification and regression problems Hinton et al. (2012); Yegnanarayana (2009); LeCun et al. (2015). In principle, NNs obey a universality theorem in that they are theoretically capable of reproducing any nonlinear functional relationship Csáji (2001). This encouraged their use for fitting PESs, where complex nonlinear relationships can exist between atomic configurations and atomic energies. While several different procedures have been proposed to develop NNPs Behler (2011); Bholoa et al. (2007); Lorenz et al. (2004); Blank et al. (1995), the Behler–Parinello construction Behler and Parrinello (2007) is followed here. The total energy of the system is decomposed into a sum of atomic contributions :
[TABLE]
Each atomic energy is calculated from the local chemical environment by an atomic NN. This atom-centered approach enables the modeling of systems with a variable number of atoms, overcoming a limitation of early NNs in the chemistry literature Prudente and Neto (1998); Bittencourt et al. (2004); Brown et al. (1996).
The type of NNs that is generally used for fitting PESs is a feed-forward neural network (FFNN) Bebis and Georgiopoulos (1994) in which information only passes in a single direction towards the output layer. There is one input layer that feeds the relative atomic positions into the network and one output layer containing the atomic potential energy . Some number of intervening hidden layers actually perform the regression, and the number of layers and the number of neurons in each layer are empirically optimized for the intended application. An example of a FFNN with one hidden layer is presented in Fig. 2.
Let the hyperbolic tangent be the transfer function for the hidden layers. The argument of the th neuron in the first hidden layer is
[TABLE]
the argument of the th neuron in the th hidden layer is
[TABLE]
and the value for the output neuron is
[TABLE]
where is the number of layers, is the weight that binds the th neuron in the th layer to the th neuron in the th layer, and is the bias for the th neuron of the th layer. The weights and biases constitute the parameter space to be fitted during the training of the NN.
The number of hidden layers is of great importance and can dramatically affect both the accuracy and performance of MD simulations. Additional layers enhance the ability of the NN to fit complex functions, but have the drawback of increasing the number of weights and biases to optimize, possibly slowing down or even hindering the training process Hochreiter (1998). Redundant layers and neurons can also cause over-fitting, meaning that the NN becomes less capable of extrapolating to configurations outside of the training set. In regression problems such as PES fitting, this can be a severe problem and significantly reduce the reliability of the NNP in energy and force predictions. Therefore, it is preferable to use the smallest possible number of layers and neurons that achieve the desired error when building the NN. We decided to use a single hidden layer after observing that additional layers did not substantially improve the fitting accuracy.
The NNs were trained using the standard back-propogation Rumelhart et al. (1986) and stochastic gradient descent Robbins and Monro (1985) algorithms. The root mean square error (RMSE)
[TABLE]
was used to quantify the error after each epoch, where is the total number of training points and and are the predicted and actual potential energies, respectively. The mini-batch size was 100 for all simulations. All of the training processes were performed in Python using Keras with the TensorFlow backend Chollet (2015); Abadi et al. (2016).
Given the atomic energies, the forces acting on each atom can be computed from the gradient of . This requires repeated application of the chain rule due to the dependence of the descriptors on the atomic positions. Let the th component of the force on the th atom be . This is obtained by summing the contributions from all atoms in the system by
[TABLE]
where is the th Cartesian coordinate of the th atom, is the th descriptor for the th atom, and is the number of descriptors. The derivatives depend only on the NN architecture and can be calculated by back-propogation. The derivatives of the proposed descriptors with respect to the Cartesian coordinates and other details of the force calculation are provided in the SM. Note that the force includes contributions from the dependence of the neighboring atoms’ energies on the position of the th atom, and from the dependence of the energy of the th atom on its own position—displacing the th atom by effectively displaces the surrounding atoms by -, contributing to the total force.
II.3 Training Data
The training data set should generally be prepared carefully, as the selection of configurations to include can significantly affect the performance and accuracy of the NN. If an atomic configuration that is not adequately represented in the reference set occurs during simulation, the error in the predicted potential energy could increase dramatically. This can be addressed by directly sampling points in diverse regions of the configuration space, e.g., by considering all possible structures represented on the phase diagram Behler et al. (2008), but there is no guarantee that other configurations would not occur in simulation. A second option would be to employ an importance sampling method to enhance the flexibility and extrapolation capability of NNs. Since our main intention is to investigate the properties of the proposed descriptors rather than develop a general-purpose NNP, training configurations were only sampled from MD simulations of silicon within a limited temperature range. Sampling was performed using the algorithm proposed by Pukrittayakamee et al. Pukrittayakamee et al. (2009) and modified by Stende Stende (2017), which was observed to reduce the fitting error. The algorithm consists of sampling the local environment around a given atom at a variable interval
[TABLE]
where is the total force acting on the th atom and is measured in units of the MD timestep. We tracked 10 atoms throughout an MD simulation using the Stillinger–Weber potential Stillinger and Weber (1985), calcualted the corresponding forces, and sampled training set configurations at intervals specified by . The inverse relationship between and ensures that high-gradient regions on the PES are more equitably represented in the training data, and is observed to reduce the fitting error. Note that and are system-dependent parameters.
III Results and Discussion
The performance of the descriptors proposed in Section II.1 in an NNP for solid-state silicon is compared with that of the BP descriptors and the SOAP descriptors. The NNP is further validated by measuring the elastic constants of solid-state silicon. The Stillinger–Weber potential Stillinger and Weber (1985) is selected as the ground truth, and was used to calculate the energies of all configurations in the training set.
III.1 Behler–Parinello Descriptors
The BP descriptors were one of the earliest sets of descriptors used for MLPs, and are still used for this purpose. Following Behler Behler and Parrinello (2007), the radial symmetry functions and angular symmetry functions are defined as
[TABLE]
where is a cutoff function and , , and are adjustable parameters. These functions are designed to create a set of real-valued numbers from the atomic distances and bond angles . A recent study Stende (2017) developed a single hidden layer NNP for silicon using the BP descriptors and a 24-10-1 architecture. The performance of this NNP is compared to one using our descriptors with a 25-10-1 architecture (the number of descriptors is not exactly the same because of indexing). The RMSE for the validation set was evaluated as a function of sampling temperature while keeping the training set and all other hyperparameters fixed. The results in Fig. 3 suggest that our descriptors provide considerably more information about the local environment and result in a more accurate NNP than the BP descriptors. Alternatively, considerably fewer of our descriptors would be required to construct an NNP of a given accuracy, reducing the expense of force calculations in MD simulations. Moreover, the proposed descriptors contain few adjustable parameters (, and ) and should therefore be widely applicable with minimal calibration, whereas the parameters , , and need to be adjusted for the BP descriptors.
We also observed that NNs using the BP descriptors were more difficult to train than ones using our descriptors. Similar to other ML algorithms, NNs often require detailed pre-processing of the input data to obtain reasonable results. One frequent problem is saturation of some hidden neurons during training, resulting in trapping around a local minimum that prevents further learning. This is related to the vanishing gradient problem, which is one of the most common issues with artificial neural networks and happens more frequently when some input values are much larger than others. The wide variation in the magnitudes of the BP descriptors, as indicated by Fig. 4, likely caused the observed difficulties with training. While Behler suggested several pre-processing techniques to overcome this issue Behler (2011), we found that the problem could be solved by initializing the weight matrices with values from the Xavier normal distribution Glorot and Bengio (2010) with a variance of where is the number of neurons in the th layer. By contrast, training with our descriptors progressed the same regardless of the weight initialization and without any additional pre-processing. The only advantage of the BP descriptors we observed was that they required roughly half the time to evaluate (with our naive implementations), but this seems to be strongly outweighted by the advantage in accuracy.
III.2 SOAP Descriptors
Bartók, Kondor, and Csányi initially introduced the Smooth Overlap of Atomic Positions (SOAP) descriptors Bartók et al. (2013) to support modeling the PES as a Gaussian process Bartók et al. (2010). Rather than directly using the SOAP descriptors as inputs into an MLP though, an inner product of normalized descriptor vectors (the SOAP kernel) is generally employed to measure the similarity of a pair of atomic environments. If is the vector of SOAP descriptors for the th atom, the SOAP kernel comparing the environments around the th and th atoms is defined by means ofSzlachta et al. (2014)
[TABLE]
where and are adjustable parameters. The quantity has been said to be a metric De et al. (2016), though the identity property of a metric requires that the distance vanish if and only if the configurations around the th and th atoms are identical. Consider the case where, for every atom in the neighborhood of the th atom, there is a corresponding pair of atoms separated by an arbitrarily small distance in the same position relative to the th atom. The expansion coefficients for the two configurations would then differ by roughly a factor of two, the SOAP descriptors by roughly a factor of four, and the distance could be made arbitrarily close to zero by adjusting the value of . That is, the quantity does not satisfy the identity property and is not a metric. The difficulty seems to be essential in that, if the vectors of SOAP descriptors were not normalized, the magnitude of would not be bounded above, and the value for which environments are considered similar would no longer be unique. The existence of this counterexample does little to inspire confidence that there are not others, particularly since this is a function in a high-dimensional space where intuition is difficult to develop.
Instead of the SOAP kernel, this study uses the SOAP descriptors as inputs for an NNP. The derivation of the proposed descriptors is closely related to that of the SOAP descriptors in several respects; a neighbor density function is projected onto a set of orthogonal basis functions, and the descriptors are given by inner products of vectors of the expansion coefficients. That said, there are several significant differences. First, the neighbor density function for the proposed descriptors is a sum of Dirac delta functions, whereas that for the SOAP descriptors is a sum of Gaussians. This has the consequence that evaluating the SOAP descriptors involves a relatively expensive numerical integration, whereas the proposed descriptors can be found merely by evaluating the relevant basis functions at the neighboring atoms’ positions. Using a sum of Gaussians is said to improve the stability of the SOAP descriptors with respect to perturbations of the atoms’ positions Bartók et al. (2013), but the differentiability of the basis functions in Section II.1 is sufficient to give the proposed descriptors the same property. Second, the SOAP descriptors are given by the inner products of vectors of expansion coefficient with different values of :
[TABLE]
Depending on the radial basis functions this could help to couple information from different spherical shells within the domain, but increases the number of descriptors and the computational expense of evaluating an NNP for fixed and . The proposed descriptors instead depend on a set of orthonormal basis functions with strong radial overlap (visible in Fig. 1), obviating the need for such explicit coupling.
Evidence that these differences do not degrade the performance of the proposed descriptors relative to the SOAP descriptors is given in Fig. 5. The performance of an NNP trained with of the proposed descriptors is nearly identical to that of an NNP trained with of the SOAP descriptors for the optimal value of the Guassian width in the neighbor density function. Additionally, the proposed descriptors have several advantages that are not visible from this figure. First, computing of the proposed descriptors for training points requires \sim$$0.2 seconds whereas computing of the SOAP descriptors requires \sim$$9.5 seconds (with our naive implementations). The special function evaluations and numerical integrations required for the SOAP descriptors would likely be expensive even in optimized code. Second, the second derviatives of the proposed descriptors are continuous to atoms passing through the domain boundary, whereas only the first derivatives are continuous for the SOAP descriptors Szlachta et al. (2014). This is significant because discontinuous second derivatives of the potential energy have been observed to lead to discontinuous elastic constants and anomalous thermal transport in MD simulations Zhou and Jones (2011). Third, the proposed descriptors do not require an arbitrary choice of cutoff function, and the number of adjustable parameters is smaller than for the SOAP descriptors. Specifically, the proposed descriptors only require that , , and be specified, whereas the SOAP descriptors have up to six adjustable parameters Szlachta et al. (2014) if the Gaussian widths in the neighbor density function and in the raw radial basis functions are allowed to be independent. Setting these adjustable parameters introduces additional complexity, with Fig. 5 showing the sensitivity of NNP performance to the value of one of them.
III.3 NNP Validations
We further investigated the performance of NNPs using our descriptors at a variety of sampling temperatures and NN architectures, with the results reported in Table 1. The number of accessible configurations in an MD simulation usually increases rapidly with temperature, meaning that any given accuracy would require more descriptors to encode the neighborhoods and training points to cover the configuration space. The parameters and in Eq. 1 set the number of descriptors, with higher values resulting in more terms in the approximation of the neighbor density function and generally lower fitting errors. The average number of neighbors varied from to within the selected temperature range, implying that a minimum of to descriptors were required. This is consistent with our observations that using more than descriptors () did not substantially decrease the RMSE, and is consistent with the number of descriptors used in other NNP studies Raff et al. (2005); Behler and Parrinello (2007); Behler et al. (2008); Artrith and Behler (2012). Moreover, when the temperature was elevated to 1500 K (the melting point of silicon is 1687 K), NNs using descriptors consistently outperformed those using descriptors; the higher average number of neighbors at these temperatures allowed more complex configurations that required more descriptors.
Using more than one hidden layer or more than ten hidden neurons did not substantially improve the accuracy of the NNP. Table 1 indicates that using more hidden neurons actually decreased the accuracy, perhaps as a consequence of the increased complexity of the training process. This differs from previous studies that used the BP descriptors in two-layer NNPs for single-species systems Behler and Parrinello (2007); Behler et al. (2008); Artrith and Behler (2012). Artrith and Behler Artrith and Behler (2012) further mentioned that monocomponent systems typically require to symmetry functions to achieve a complete description, and used in their study. The reason for the difference in behavior with that observed here is not known, but is conjectured to be related to our descriptors deriving from an efficient expansion of the neighbor density function using orthogonal basis functions, and to our radial basis functions effectively coupling information in multiple spherical shells.
Finally, the NNP developed here was added as a new pair-style to LAMMPS. The bulk modulus, shear modulus and Poisson’s ratio of solid-state silicon were calculated from an MD simulation using an NNP with our descriptors and a 25-10-1 architecture at 300 K, and compared with those reported for the SW potential. The results in Table 2 offer additional evidence that our NNP is able to reproduce features of the potential energy surface with excellent accuracy.
IV Conclusions
Belonging to the family of machine learning force-fields, high-dimensional neural network potentials have been found to be viable alternatives to electronic structure calculations by providing similar levels of accuracy at a lower computational cost. One crucial requirement for developing a robust neural network potential is a description of the local atomic neighborhood as a set of symmetrically-invariant real-valued numbers. Referred to as descriptors, different constructions have been proposed in the literature, but there is as yet no established canonical choice due to the recent emergence of the field. This paper introduces a new set of orthogonal descriptors that are invariant to the physical symmetries and can more efficiently represent structural environments than two of the frequent alternatives Behler and Parrinello (2007); Bartók et al. (2013).
The performance of the proposed descriptors in a neural network potential was compared to that of the Behler–Parinello descriptors and the SOAP descriptors, both commonly employed in machine learning potentials. For a given training set and comparable hyperparameters, our descriptors were found to give substantially smaller fitting errors than the Behler–Parinello descriptors, and similar fitting errors to the SOAP descriptors but at an order of magnitude lower computational cost. The superior performance of the proposed descriptors as compared to the Behler–Parinello descriptors is conjectured to be a consequence of the proposed descriptors deriving from a function expansion over orthogonal basis functions that efficiently encodes configurational information. As for the SOAP descriptors, the improved computational efficiency is a consequence of avoiding special function evaluations and numerical integration. Finally, the suitability of the proposed descriptors for machine learning potentials was verified by preliminary molecular dynamics simulations of solid-state silicon.
V Supplementary Material
The detailed derivation of the proposed radial basis functions and the force calculation procedure can be found in the Supplementary Material.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Hautier et al. [2012] Geoffroy Hautier, Anubhav Jain, and Shyue Ping Ong. From the computer to the laboratory: materials discovery and design using first-principles calculations. Journal of Materials Science , 47(21):7317–7340, 2012.
- 2Subramanian [2009] Lalitha Subramanian. Use of Force Fields in Materials Modeling. Reviews in Computational Chemistry , 16:141, 2009.
- 3Burke [2012] Kieron Burke. Perspective on density functional theory. The Journal of chemical physics , 136(15):150901, 2012.
- 4Binks and Grimes [1993] D Jason Binks and Robin W Grimes. Incorporation of monovalent ions in Zn O and their influence on varistor degradation. Journal of the American Ceramic Society , 76(9):2370–2372, 1993.
- 5Botu et al. [2016] Venkatesh Botu, Rohit Batra, James Chapman, and Rampi Ramprasad. Machine learning force fields: construction, validation, and outlook. The Journal of Physical Chemistry C , 121(1):511–522, 2016.
- 6Süle and Szendrő [2014] Péter Süle and M Szendrő. The classical molecular dynamics simulation of graphene on Ru (0001) using a fitted Tersoff interface potential. Surface and Interface Analysis , 46(1):42–47, 2014.
- 7Behler and Parrinello [2007] Jörg Behler and Michele Parrinello. Generalized neural-network representation of high-dimensional potential-energy surfaces. Physical review letters , 98(14):146401, 2007.
- 8Bartók et al. [2010] Albert P Bartók, Mike C Payne, Risi Kondor, and Gábor Csányi. Gaussian approximation potentials: The accuracy of quantum mechanics, without the electrons. Physical review letters , 104(13):136403, 2010.
