Large scale ab-initio simulations of dislocations
Mauricio Ponga, Kaushik Bhattacharya, Michael Ortiz

TL;DR
This paper introduces MacroDFT, a scalable ab-initio simulation method for studying dislocation cores and defects in crystalline materials, enabling large-scale quantum mechanical analysis with reduced computational costs.
Contribution
The paper presents MacroDFT, a novel coarse-grained density functional theory approach that allows large-scale ab-initio simulations of dislocations and defects in crystalline materials.
Findings
Successfully computed dislocation core configurations in magnesium.
Analyzed interaction energies with aluminum solutes.
Demonstrated the ability to capture elastic energy divergence at large scales.
Abstract
We present a novel methodology to compute relaxed dislocations core configurations, and their energies in crystalline metallic materials using large-scale \emph{ab-intio} simulations. The approach is based on MacroDFT, a coarse-grained density functional theory method that accurately computes the electronic structure but with sub-linear scaling resulting in a tremendous reduction in cost. Due to its implementation in \emph{real-space}, MacroDFT has the ability to harness petascale resources to study materials and alloys through accurate \emph{ab-initio} calculations. Thus, the proposed methodology can be used to investigate dislocation cores and other defects where long range elastic defects play an important role, such as in dislocation cores, grain boundaries and near precipitates in crystalline materials. We demonstrate the method by computing the relaxed dislocation cores in…
| Property | This work | ABINIT |
|---|---|---|
| E [eV] | -24.61 | -24.678 |
| B0 [GPa] | 38.74 | 38.4 |
| V0 [eV] | 42.317 | 42.351 |
| Simulation | # Atoms | # RepAtoms | #ESPs | #EPs | # Threads | # MPI processes |
|---|---|---|---|---|---|---|
| P.D.L. | 256,000 | 5,000 | 1,126,400 | 262,144,000 | 64 | 1 |
| Screw Basal | 158,000 | 4,000 | 560,395 | 163,840,000 | 64 | 1 |
| Screw Prismatic | 158,00 0 | 4,000 | 560,395 | 163,840,000 | 64 | 1 |
| Cluster Size [Atoms] | |||
|---|---|---|---|
| 7 | 19 | 37 | |
| Binding energy (non-collapsed) [eV] | 1.25 | 4.39 | 9.36 |
| Binding energy (Collapsed) [eV] | — | 2.99 | 10.13 |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Large scale ab-initio simulations of dislocations
Mauricio Ponga
Department of Mechanical Engineering, University of British Columbia, 2054 - 6250 Applied Science Lane, Vancouver, BC, Canada, V6T 1Z4
K. Bhattacharya
Division of Engineering and Applied Science, California Institute of Technology, 1200 E. California Blvd., Pasadena CA, USA 91125.
M. Ortiz
Division of Engineering and Applied Science, California Institute of Technology, 1200 E. California Blvd., Pasadena CA, USA 91125.
Abstract
We present a novel methodology to compute relaxed dislocations core configurations, and their energies in crystalline metallic materials using large-scale ab-intio simulations. The approach is based on MacroDFT, a coarse-grained density functional theory method that accurately computes the electronic structure but with sub-linear scaling resulting in a tremendous reduction in cost. Due to its implementation in real-space, MacroDFT has the ability to harness petascale resources to study materials and alloys through accurate ab-initio calculations. Thus, the proposed methodology can be used to investigate dislocation cores and other defects where long range elastic defects play an important role, such as in dislocation cores, grain boundaries and near precipitates in crystalline materials. We demonstrate the method by computing the relaxed dislocation cores in prismatic dislocation loops and dislocation segments in magnesium (Mg). We also study the interaction energy with a line of Aluminum (Al) solutes. Our simulations elucidate the essential coupling between the quantum mechanical aspects of the dislocation core and the long range elastic fields that they generate. In particular, our quantum mechanical simulations are able to describe the logarithmic divergence of the energy in the far field as is known from classical elastic theory. In order to reach such scaling, the number of atoms in the simulation cell has to be exceedingly large, and cannot be achieved with the state-of-the-art density functional theory implementations.
Keywords: Density functional theory , large-scale ab-initio simulations , petascale simulations , screw Dislocations , prismatic dislocation loops , Magnesium.
1 Introduction
Dislocations are the main carrier of plasticity in crystalline materials. Their behavior and interactions have strong consequences in the mechanical response of materials. As such, dislocations have been extensively studied since the seminal work of Volterra [1]. Dislocations have long been analyzed by separating their effect into two main contributors, the dislocation core and its long-range field [2]. The former is characterized by strong non-linear quantum mechanical effects; the latter is a long-range elastic field characterized by smooth atomic displacements that are typically modeled using continuum elasticity theory in either its isotropic or anisotropic versions. However, the elastic fields are long-range: the interatomic distances or strains decay inversely with distance , where is the distance from the defect, so that the energy density is logarithmically divergent. This means that the core and far-field are intimately coupled and the traditional separation is problematic. This has many physical implications including the effect of the far-field stresses on the core and the interaction between the dislocation and solute atoms. Therefore, a full understanding requires ab-initio techniques [3]. However, this is not a trivial task since the slow decay of the far field and the intimate coupling means that a very large number of atoms are needed to properly describe this behavior. Unfortunately, this cannot easily be achieved with existing ab-initio methods, since they scale poorly ( in traditional approaches and in newer linear scaling approaches [4, 5]).
Different multiscale approaches have been proposed to overcome these difficulties. Most of these models consist of patching together heterogeneous models at different scales, from DFT to continuum elasticity to couple the long-range effect and the dislocations core. The coupling between the various models varies from parameter passing to hybrid Hamiltonians [6, 7, 8, 9, 10, 11]. Since these methods bring together distinct models that embody different physics and differing mathematical formulations, they typically assume separation of scales (as in parameter passing) or additional physics or constraints at the interface (as in hybrid Hamiltonians). Often times, the models need to be adjusted or calibrated to the particular phenomenon under consideration, which ultimately detracts from the predictiveness and fundamental nature of first-principles calculations. A variation is the interesting work of Woodward and collaborators [12, 13] where they used full density functional theory in a core region and a linear response (Green’s functions) outside. Still, the approach separates the domain into multiple regions with two distinct theories.
In this work, we propose a new approach where the equations of density functional theory are solved on a domain large enough to capture both the core and elastic fields in a completely seamless manner. It builds on the recently developed Coarse-Grained Density Functional Theory (CG-DFT) technique developed by Ponga et. al. [14, 15] and has it roots in the linear scaling method of Suryanarayana et al. [16]. The latter reformulates the original Kohn-Sham equations [17, 18] into density matrix form, and then approximates the density matrix using spectral Gauss quadrature rules. This enables the calculation of the electron density and other quantities of interest at any point in space at fixed or cost. Consequently the algorithm has linear scaling. However, the fact that the method is local allows the construction of an adaptive numerical approximation that has sub-linear scaling. The key idea is to exploit the decay so that we have full resolution near the core where the details are important and only sample the electronic fields far away where the atomistic displacements decay in a smooth fashion. However, this requires care: even though the atomistic displacements decay smoothly enabling a quasi-continuum approximation [19, 20], the electronic fields (like the electron density and Hartree potential) oscillate in an (almost) periodic manner on the scale of the lattice. Our key observation is that the electronic fields of interests can be written as a sum of two parts, a predictor which captures the periodicity in the far-field and a corrector that is complex at the core but decays away from it. We compute the former inexpensively via unit cell calculations at representative points, while we compute the latter on a gradually coarsening set of electronic sampling points by exploiting the local nature of our formulation. Together, this gives sub-linear scaling with the number of electrons allowing the multiscale simulations of hundreds of thousands of atoms with DFT. The technique has been implemented in the MacroDFT code. We will refer to the CG-DFT technique as well as the code as MacroDFT.
The key difference between our prior work [15] is the treatment of the predictor field. In our previous work [15], the predictor was taken to coincide with the periodic solution of the perfect stress-free crystal. While this choice is effective for defects that decay quickly, i.e., or , it is not for dislocations where the decay is much slower, i.e., . The slower decay requires us to update the predictor non-uniformly depending on the far-field deformation, and this adds significant complexity to the method. This is accomplished in this work.
We use MacroDFT to compute the formation energy of prismatic dislocation loops and basal dislocations in magnesium (Mg). Dislocations in Mg have been extensively studied using ab-initio techniques [21, 22, 23], molecular statics [24] or combination of them [25]. However, often times these simulations are carried out using incompatible boundary conditions such as periodic boundary conditions or free surfaces to patch the interface with continuum approaches, that ultimately affects the energy of the dislocation core, and possible its configuration [3]. These conditions do not reflect the true nature of a dislocation core, and can influence the structure of the dislocation core [26, 27, 28] as in the case of body-centered materials. Due to these artifacts, no simulation technique using ab-initiotechnique has been able to prove the theoretical logarithmic divergence predicted by the elasticity theory. In this work, we endeavor to address some of the limitations from other ab-initio techniques and study dislocation cores using compatible boundary conditions, and to investigate the scaling of the energy for large simulation cells. Prismatic dislocations loops are made by removing a certain amount of atoms from the bulk material in the basal plane, leading to a Burgers vector () in the component of the hexagonal closed-packed (hcp) structure of Mg. On the other hand, basal and prismatic screw dislocations are characterized by a much shorter Burgers vector, and their far field is characterized by the elastic decay. Thus, we generate a single dislocation in an infinite crystal, we measure the evolution of the strain energy as a function of the distance from the dislocation core. Our simulations show that the strain energy diverges logarithmically with the ratio , in agreement with the elasticity theory of dislocations [2]. We then compute the interaction energy of the dislocation core with a row of solute atom, and show the transition barriers as dislocation moves through it.
The manuscript is organized as follows. We start in Section 2 by introducing the methodology in terms of a density matrix formulation and its spectral representation in Riemann-Stieljes integrals. We then introduce the coarse-grained representation. Next, the computational set-up and verification of the method is discussed in Section 3. The results for prismatic loops and screw dislocations are discussed in Section 4, and the parallel performance of the method is demonstrated in 5. Finally, we summarize the work with main outcomes in Section 6.
2 Methodology
2.1 Density Functional Theory
Consider a system of atoms with electrons and let denote the position of nuclei with charges , respectively. The corresponding energy of the system according to the Local Density Approximation of Kohn-Sham DFT111We ignore spins in this presentation though they are easily incorporated. is [4, 29]
[TABLE]
where are the electronic orbitals, is the charge density and the exchange-correlation is given by
[TABLE]
with to be as proposed by Perdew and Wang [30] fitted to accurate Monte Carlo simulations carried out by Ceperley et. al. [31]. The expensive Coulombic double sums are sidestepped by introducing the electrostatic potential as the solution to the Poisson equation [32, 33].
We minimize this functional over all orbitals subject to orthonormality and this leads to the nonlinear eigenvalue problem
[TABLE]
with the ordered eigenvalues of the Hamiltonian
[TABLE]
with the Coulomb potential obtained by solving the Poisson’s equation
[TABLE]
where denotes the total charge density of the nuclei, with representing the regularized charge density of the nucleus, and
[TABLE]
This nonlinear eigenvalue problem is solved using fixed point iteration in the self-consistent formulation (SCF). This leads to an expensive computational procedure that critically restricts the size of the systems to electrons in practice. Recent real-space implementations using efficient finite element basis and finite differences can increase this number up to roughly 10,000 electrons [34, 35, 36].
With the goal of obtaining a linear scaling algorithm, we reformulate the problem above using the the density matrix formulation following Anantharaman and Cancès [37] and Wang et al. [38]. The problem of minimizing (1) is equivalent at zero temperature to the variational problem
[TABLE]
where the maximum is taken over , is the set of all trace class operators on , and is the Legendre transform of . One can see this formally by setting , using the electrostatic potential to rewrite the Coulomb interactions, using a duality transform for the exchange correlation and exchanging the order of extremization. These steps are explained in detail and justified in [38]. It is possible to show that the solution to variational problem is given by the density matrix
[TABLE]
and is the Fermi level corresponding to the constraint , the electrostatic potential that satisfies the Poisson’s equation (5), the exchange correlation potential that satisfies (6) and the charge density .
This density matrix formulation is the basis of linear scaling algorithms (). Typically, the density matrix (8) is expanded using polynomials and the expansion truncated assuming bandedness of the Hamiltonian and the decay of off-diagonal components.
We follow a different approach, the linear scaling spectral Gauss quadratures (LSSGQ) following Suryanarayana et al. [16]. The basic idea is to appeal to the spectral representation of the operator and to approximate the spectral integrals using Gauss quadratures rules. For any function ,
[TABLE]
where is the spectrum of , and is the spectral measure of contracted with . We treat the integral as a Riemann-Stieljes integral and approximate it with quadratures rules with quadrature nodes and quadrature weights to obtain the approximation shown.
We use a sixth order finite difference approximation with grid points corresponding to a uniform grid spacing , and choose an orthonormal basis corresponding to this finite difference basis to represent the Hamiltonian and other quantities. Then, it is possible to show that the ground state energy is given by [15, 16]
[TABLE]
with
[TABLE]
It then remains to determine the quadrature points and weights . We do so using Lanczos iteration [39] (see [16, 15] for details). The key observation is that since the finite difference basis has finite support, we can compute the the quantities with effort at each nodal point . So, the method automatically has linear scaling. Further, this observation means that the evaluation of all the electronic quantities of interest can be done locally (once the Fermi level is known), and this enables the coarse-graining approach described below.
Once we compute the electronic fields, we can compute the forces on the atomic nuclei (the variation of the total energy with respect to the nuclear positions ) easily using the Hellman-Feynman theorem.
2.2 Coarse-grained extension
Our goal is a sublinear scaling algorithm that will enable computation of large domains as necessary for dislocations. We observe that the interatomic distances (or strain) decay at the rate of as we go away from the dislocation core, and the corresponding electronic fields (electron density, electrostatic potential) decay to their periodic behavior at the same rate. So we introduce a computational basis that makes it possible to describe all details close to the core where it is necessary, but sample it far away where it is not. Importantly, the approach is seamless and adaptive. Given the different behavior, the coarse graining of the atomic positions and electronic fields are done differently.
Coarse grained representation of atoms
The atomistic distances may vary significantly close to the defect, but decay in a polynomial manner. So, we introduce a subset of atoms, called representative atoms (RepAtoms). This subset is dense, i.e., it contains every atom close to the defect but gradually coarsen away from it. We track the positions of these atoms and infer the position of other (non-representative atoms) from the position of the representative atoms by introducing linear finite elements shape functions based on a mesh with nodes as in the atomistic quasi-continuum formulations [15, 19, 20, 40, 41, 42, 43]. Note that this mesh is Lagrangian. Details of the atomic mesh and visual representation of them are provided in Section 3.3.
Coarse grained representation of electronic fields
The electronic fields are complex close to the defect but decay to a periodic field away from it, since the far-field is almost periodic with a period related to the unit cell. So a smooth interpolation is not appropriate. We get around it representing the electronic fields as a sum of a predictor and a corrector. The predictor is a slowly varying almost periodic function away from the defect and captures the periodicity of the electronic field in the far field. The difference between the actual electronic fields and the predictor, therefore smoothly decays away from defect. This is represented by the corrector. Since the corrector decays smoothly, we can use a smooth interpolation away from the defect.
This is shown schematically in Figure 1 with a vacancy as the defect. Figure 1(a) shows the evolution of an electronic field around the vacancy centered in the middle of the picture. For reference, a line AA’ is shown to plot the charge density later on. Next, Figures 1(b) and (c) show a contour plot of the charge density for the vacancy and the pristine sample, respectively. We intuitively see that these fields are very similar with the exception of near the defect (marked with an ”x”), where large fluctuations are seen. Thus, if we represent the difference between the charge density for the vacancy and the pristine cell, we obtain the contour map shown in Figure 1(d). We see that only near the vacancy this field is large, and very quickly decays to zero. This is the corrector field, and the pristine solution is called in this case the predictor. Putting everything together, we see in Figure 1(e) the three fields, i) the full solution with the vacancy () in black; ii) the predictor field () in red; and iii) the corrector () in blue222The corrector has been shifted down to make the visualization better.. Now, it is possible to reconstruct the full solution at any point by using an interpolation scheme provided by the FE mesh. This is schematically shown with the blue markers in the blue line of Figure 1(e), where we see a dense sampling near the vacancy, but a coarsening as one moves away from it. This is the principle for coarse-grained description of MacroDFT.
To be precise, we introduce a fine uniform spatial grid and use representation
[TABLE]
for the electrostatic field and electron densities at each of these points where are the predictors and are the correctors. We defer the description of the predictor till the next section.
To obtain the corrector with sublinear scaling, we compute it only a subset of electronic sampling points (ESPs) . We then use a finite element mesh over the ESPs and use it to interpolate these fields to the fine mesh:
[TABLE]
The ESPs are dense close to the defect, but gradually coarsen away from it. Note that this mesh is spatial or Eulerian.
Finally, note that we have two meshes, one atomistic and Lagrangian that is used to describe the relaxation of the atoms as the forces and energy are minimized; and an electronic mesh that Eulerian to solve the Kohn-Sham equations in the sample. Both meshes have adaptive resolution, as previously described. They interact through the theory and therefore, it is important to introduce two special functions that map the ESP into the elements of the , and positions to the elements of the fine mesh . For more details, the reader is referred to [15].
Total energy
We compute the atomic positions and electronic fields according to the Algorithm 1. It remains to calculate the total energy (10) of the system. We do so by sampling using cluster as follows. Let be the set or cluster of nodal points on the fine mesh contained in a ball of radius around the ESP. We approximate the total energy (10) as [15]
[TABLE]
where is the weight of the ESP and it is defined as
[TABLE]
and is the average Free-Energy at the ESP computed as
[TABLE]
2.3 Selection of the predictor field
The key for an efficient and accurate coarse-grained implementation is the choice of the predictors. In our previous work [15], the predictor field was taken to coincide with the periodic solution of the perfect stress-free crystal. While this choice is effective for defects that decay quickly, i.e., or , where is the distance from the defect, it is not for dislocations where the decay is much slower, i.e., . Therefore, we need a more accurate approximation that reflects the deformation.
Recall the triangulation . In each element of this triangulation, the positions of the atoms is obtained from a linear interpolation and thus periodic with a unit cell deformed by the deformation gradient given by the atomistic element. It is important that the predictor reflect this periodicity. Therefore, we extract the deformation gradient from each atomistic element, deform the unit cell with this deformation gradient, and compute the electronic field associated with this distorted unit cell. We then extend this periodic field to the entire element. Note that this field automatically reflects the local strain. However, it can be discontinuous at the element boundaries; we remedy this by smoothing with a projection.
Thus, once the atomic positions are computed with the non-linear conjugate gradient, a new deformation gradient, , is computed per each element. The deformation gradient is then applied to a unit cell, and the electronic fields are computed on it. We then extract the corrector fields, i.e., and , and map them back to the full simulation as schematically shown in Figure 2. The calculations in the unit cell are carried out by individual nodes, and their results are stored on the disk to save memory with an MPI-file. While this puts some restrictions to the speed of the code, the time is not comparable with the amount taken to solve the DFT equations, and then it is affordable.
3 Computational set-up
We now describe the details of the simulations. The details of the meshes used in this work are described in Table 2.
3.1 Verification of the method
Verification of our method has been provided in our previous publication [15], and here we reproduce some of these results for completeness. In our formulation, we first start by investigating two main parameters that condition the accuracy and convergence of the methodology. These two parameters are the number of quadrature points, used to compute Eqs. 10 and 11, and the number of ESP used to discretize the atomic unit cell in full resolution. Figure 3 shows the convergence with respect to number of quadrature points and spatial discretization for a unit cell of Mg. We see how the difference in the energy is reduced as both quadrature points and number of ESPs is increased. In this work, we retained the spatial resolution we used in our previous work. This is about 2,300 ESPs per unit cell, which ensures a convergence of about 0.01 meV per unit cell. We have also used a sufficiently large number of quadrature points, . With these parameters, is possible to achieve cohesive energy for Mg that are converged to less than a 1 meV. Using a unit cell and full resolution, we have computed cohesive energy values, bulk modulus, and reference volume for Mg that are in good agreement with previous works that have used plane wave implementations. These values are shown in Table 1 for completeness. We can see that the values obtained with MacroDFT are in close agreement with ABINIT calculations. The stable stacking fault energy for Mg was also computed with MacroDFT using the described parameter in a simulation cell containing unit cells, i.e., 192 atoms. The values of the stable stacking fault energy was found to be mJm*-2*, in good agreement with previous works [25]. Convergence of our coarse-grained simulations has been done in our previous work [15]. In addition, MacroDFT has been used to compute the energy of twin boundaries in Mg, showing good agreement with other techniques [44]. Here, we used the same level of coarsening, that is, the element size increases in different regions of coarse representation by a factor of two. This ensures that the evaluated energy does not depend of the mesh size.
3.2 Prismatic dislocation loops in magnesium
A single crystal Mg was generated using the minimum energy configuration for the bulk crystal at Å and [15]. The overall length of the computational domain was containing atoms. A full resolution zone of was provided in the center of the simulation cell. This area provided fully resolved atomic and electronic mesh containing around ESPs. Surrounding the full resolution zone, multiple coarse-grained regions were provided for both the electronic and atomic fields. Two Delaunay triangulations over the ESPs and RepAtoms were performed, and a FE mesh was provided over these triangulations to interpolate atomic positions and corrector fields. From the center of the full resolution area, atoms were removed to generate a cluster of vacancies. The boundary conditions were set up such that the solution decay to the pristine solution at the end of the simulation cell, for both atomic positions and electronic fields. Let be the atomic displacement field, then and . This means that we enforced the displacement of the atoms to be zero out of the simulation box and forced the corrector field of the electron density and electrostatic potential to be zero outside the simulation cell too. The selection of these boundary conditions is justified based on the idea that the deviations of the electronic field and atomic displacements decay to zero far away of the defect.
3.3 Screw dislocations in magnesium
Two screw dislocations were generated in a single Mg crystal of dimensions . A basal and a prismatic screw dislocations were generated by displacing the atoms in the simulation cell using the isotropic elastic solution [2]. The dislocations were generated such that the dislocation line was aligned with the direction in the sample as shown schematically in Figure 4.
A full resolution zone of surrounding the dislocation core was provided for the . By contrast, a full resolution zone of surrounding the dislocation core was provided for the . In the full resolution region, the spatial resolution for the electronic degree of freedom is about ESPs per unit cell. The total number of ESPs in the full resolution area is around . Surrounding this area, three coarse-grained regions were provided. Each coarse-grained region had a constant element size that increased with the distance to the core. The grand total number of electronic degrees of freedom was around 560,395 ESPs. Over this set of points, a Delaunay triangulation was performed and a FE mesh was generated by using four-nodes tetrahedral elements with linear interpolation functions.
Similarly to the electronic mesh, an atomic mesh was also provided. The atomic mesh also contained a full resolution area that coincides with the full resolution electronic region. Away from the dislocation core, multiple coarse-grained regions were provided. The perfect atomic positions are used as starting point for this mesh. Figure 5(a) shows spatial distribution of the atoms for a screw dislocation as well as the FE mesh for a sample with three coarse-grained regions. Figure 5(b) and (c) show a close-up of the full resolution zone for the two slip systems considered in this work.
The coarse grained atoms on the slip plane need to have a special capability -similar to the discontinuous FE Galerkin formulation- where the elements are allowed to have a dual position in order to interpolate properly the atomic positions in both sides of the slip plane. With the screw dislocation generated, a FE mesh was constructed over the representative atoms using an -Delaunay triangulation. The -Delaunay triangulation was needed because the set of representative atoms with the screw dislocation is not a convex hull and highly distorted elements were generated with a regular Delaunay triangulation. A full resolution simulation of the previously described dislocation will involve 304 million ESPs. This leads to a total acceleration due to the spatial coarse-grained approach of 545 times.
For all simulations we used a which is consistent with other DFT studies of metals and the energy and forces were minimized by using the Polak-Ribière version of the non-linear conjugate gradient (NLCG). The boundary conditions were set up such that the atomic displacements outside the computational cell were given by the elastic solution [2]. By doing so, we can compute for all elements outside the computational box the deformation gradient a priori. Thus, for each atom outside the computational domain, the electronic field and atomic displacements are fully determined. This means that .
4 Results
4.1 Prismatic dislocation loops in magnesium
The energy of the computational cell is minimized using a NLCG and the formation energy and the binding energy were computed as
[TABLE]
and
[TABLE]
where is the formation energy for the cluster with vacancies, is the total energy of the system containing vacancies, is the energy in the pristine configuration computed as the number of atoms () times the cohesive energy per atom () in Mg. is the biding energy for vacancies, and is the formation energy of a single vacancy.
Table 3 shows the values of the formation energy and the binding energy for both the non-collapsed and collapsed configurations. The binding energy for cluster of vacancies lying on the basal plane is quite large and correspond to a very stable defect. The binding energy computed using Eq. 19 measures the difference between the energy in a cluster of vacancies and the energy of the same number of individual vacancies. Therefore, such a large binding energy indicates that vacancies will always try to regroup in clusters instead of being randomly disperse in the sample, at least at low temperatures where entropic effects are not important. Figure 6 shows the electron density through two planes at and , respectively for a sample containing 7 vacancies in the non-collapsed configuration.
It is also interesting to analyze the binding energies of the same clusters collapsed, in order to understand the formation of prismatic dislocation loops in Mg. For instance, for the sample containing 7 vacancies, the collapsed configuration was always unstable and the atoms came back to the non-collapsed configuration shown in Figure 6. On the other hand, when the number of vacancies was increased to and , the binding energy experienced a change, as shown in Table 3. The binding energy for was around 2.99 eV, and for was around 10.13 eV. This highly non-linear demeanor of the binding energy indicates a preference towards large prismatic dislocation loops and therefore, sets a minimum size of PDL that can be observed in Mg. Although configurations with larger number of vacancies are not feasible with MacroDFT, there is a clear trend in the binding energy which favors large PDLs, as observed in experiments [45, 46, 47] of Mg and nanoporous Mg alloys.
4.2 Screw dislocations in magnesium
After relaxing the dislocation core, the excess of energy for the dislocations was measured with respect to the pristine Mg configuration. To measure the energy of the dislocation core, we took a cylinder of radius centered in the dislocation core (, ) with principal axis along the dislocation line. Then, we analyzed the excess of energy for different values of . The excess energy was defined as
[TABLE]
where is the total number of atoms in the cylinder of radius , is the total energy in the same cylinder after the core has been relaxed, and is the cohesive energy of pristine Mg. The energy in the cylinder, is computed by evaluating the electronic fields for each ESP inside the cylinder using the predictor-corrector approach. Once the fields are evaluated, the total energy was computed using Eq. 17.
In the analysis of dislocations it is usually common to show the evolution of the excess energy in the cylinder as a function of the ratio , where is an arbitrary constant. It is common practice to take , and under this assumption the energy contained in the cylinder can be expressed as [2]
[TABLE]
where GPa is the shear modulus of Mg computed with MacroDFT, and is an indeterminate constant that needs to be evaluated using discrete techniques such as ab-initio or molecular dynamics calculations. Remarkably, the value is not a physical quantity as it depends on the value that is arbitrary. The logarithmic divergence of Eq. 21 indicates that in order to understand the energy of the dislocation core, the energy has to be sampled across very large samples spanning multiple Burgers vectors in order to obtain the elastic pre-factor .
Figure 9 shows the evolution of the excess energy, computed with Eq. 20, as a function of the . By adjusting the values to the results obtained with MacroDFT, we have concluded that meV/ and meV/ for the basal and prismatic slip systems, respectively. This small strain energy per unit length is very difficult to measure and illustrate the importance of coarse-grained techniques such as MacroDFT. The results also show an excellent agreement for the far field with the elastic solution (characterized by the slope of the blue curve) which lends credence to the prediction of dislocation far field obtained with MacroDFT and therefore, to the coarse-grained scheme used.
Figure 10 shows the differential displacement plots [48] structure of the dissociated core for the basal and prismatic screw dislocations. The original basal dislocation is dissociated into two partials according to the reaction
[TABLE]
where SF stands for stacking fault. By examining the atomic positions as shown in Figure 10(a) we observed that the dissociation distance was around . By comparing with previous studies of screw dislocations in the basal plane, we found that this value is slightly larger than previously reported [25]. This could be attributed to a number of reasons, including different boundary conditions, pseudo-potential used, etc. The screw dislocation is stable and does not split into two partials. This is consistent with stacking fault calculations along the prismatic plane. These calculations show that there is not stable local minima in this direction, and therefore, the prismatic dislocation cannot split.
4.3 Aluminum solute-dislocation interaction
We now proceed to measure the interaction energy between an Al solute atom and the dislocations for various position along the slip plane. In order to do so, we replace a Mg atom by an Al solute stating from the position . We only replace atoms lying on the slip plane for the two screw dislocations considered in this work. Then, the excess energy was computed as
[TABLE]
where is the energy in a cylinder of radius , containing N-1 Mg atoms and one Al atom at , and is the energy in a cylinder of radius , containing N-1 Mg atoms and one Al atom at the , which is the maximum distance from the core at which we have placed the solute atom.
Figure 11 shows the excess of energy for various positions of the solute atom for the basal and prismatic slip system (Figure 11(a) and (b), respectively). Figure 11 depicts a non-linear interaction energy between solute and dislocation core, and this interaction is very strong near the core decaying very fast relatively quickly. Remarkably, the maximum interaction energy is of the order of 50-60 meV/Angs. which is larger than the core energy of the dislocations.
5 Parallel performance
We now proceed to discuss the parallel performance and speed-up achieved with MacroDFT in MIRA an IBM BG/Q 1.6 GHz PowerPC A2 supercomputer property of Argonne National Laboratory. Measuring the efficiency and performance are essential for new DFT methodologies and codes, as most of the classical plane-wave codes scale poorly for more than 64 processors. Thus, the next generation of computational tools should be able to harness petascale resources that can further accelerate the discovery of new materials and alloys through accurate ab-initio calculations. This is challenging since both the methodology and implementation need to be highly efficient to handle thousands of processors at the same time. Here, we provide results for the strong scaling of MacroDFT using up to 262,144 threads. The IBM BG/Q 1.6 GHz PowerPC A2 node architecture has 16 physical processors, each node is capable of running 4 threads. Thus, the user can specify any combination of threads and processor per node such that the number . The ratio processors/threads has to be evaluated for each code/problem. In our case, we found that the best combination was to use processes with 64 threads, which reduces the internode communication. Using this combination, the threads share the same memory, the number of MPI communications was reduced significantly. For the benchmark, we have used the non-collapsed cluster of vacancies containing 7 vacancies described in Section 3.2 and compute one SCF evaluation using different numbers of threads.
Figure 12 shows the parallel performance for MacroDFT from 8,192 (2,048) to 262,144 (65,536) threads (processors). The results show that MacroDFT has a good efficiency up to 131,072 threads with an efficiency of , while for 262,144 threads, an efficiency of is achieved. This remarkable performance is due to the fact that the DFT formulation (Eq. 9) can be solved locally and the relevant electronic fields ESP can be evaluated with only the information of a small neighborhood surrounding the point. Since the computational representation of the ESPs is in real-space, the cost of computing the electronic fields can be distributed between multiple processors taking advantage of massively parallel computers, as only the local components of the Hamiltonian need to be updated every SCF iteration. According to the scaling performance shown in Figure 12, the time taken by the interpolation scheme is negligible in comparison with the time consumed to determine the nodes and weights of the spectral quadrature rule. By contrast, when the number of threads is increased significantly, 262,144 or more, the performance goes down due to multiple factors including load imbalance, few number of ESP per node, interpolation, and data writing to disk. Nonetheless, the performance of MacroDFT is outstanding when compared with the state-of-the-art DFT codes, which cannot even run in such a large number of processors. Therefore, real-space DFT implementations have an opportunity to take advantage of supercomputing power and over-perform traditional plane-wave based implementations.
6 Conclusions
We have developed a novel approach for the study of dislocations in crystalline materials using ab-initio techniques. In this approach, the equations of density functional theory are solved on a domain large enough to capture both the core and elastic fields in a completely seamless manner. It technique is based on three-main pillars i) reformulation of the Kohn-Sham equations into a density matrix formulation and subsequent spectral representation of the density matrix (See Section 2.1); ii) Separation of the electronic fields in predictor and corrector counterparts (Equations (13), (14) and Section 2.3); and iii) Interpolation scheme for electronic fields and displacements using finite element meshes (2.2). The proposed approach has been implemented in the MacroDFT code [42] which can be executed in MPI, thread and hybrid options according to the architecture of the high performance cluster available. The parallel efficiency of the code has been studied, achieving good parallel performance in extreme scale computers capable of running threads.
The code and method has been used to study the relaxed configurations of the atoms in multiple dislocation cores in Mg, a crystalline metallic material. For PDLs, our simulations have predicted that vacancies always bind together, at least for low temperatures where entropic effects are weak. In screw dislocations gliding in the basal and prismatic planes, we have found that the strain energy stored in the computational sample scales accordingly to the elasticity theory for both cases. The results indicate that in order to obtain good prefactors, the simulations has to be sufficiently large, of the order of a few hundred thousand atoms. The agreement of the elastic prefactor between our simulations and theory lends credence to the approach, and provides numeric evidence that the coarse-grained approach can be used in complex scenarios where quantum mechanical interactions are important, such as in dislocation cores in alloys, grain and twin boundaries.
7 Acknowledgments
We gratefully acknowledge the support of the U.S. Army Research Laboratory (ARL) through the Materials in Extreme Dynamic Environments (MEDE) Collaborative Research Alliance (CRA) under Award Number W911NF-11-R-0001 and from the Natural Sciences and Engineering Research Council of Canada (NSERC) through the Discovery Grant under Award Application Number RGPIN-2016-06114. We also are grateful to Compute Canada through the Westgrid consortium for giving access to the supercomputer grid. This research used resources of the Argonne Leadership Computing Facility, which is a DOE Office of Science User Facility supported under Contract DE-AC02-06CH11357.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] V. Volterra. Sur l’équilibre des corps élastiques multiplement connexes. Annales scientifiques de l’École Normale Supérieure , 24:401–517, 1907.
- 2[2] J.P. Hirth and J. Lothe. Theory of Dislocations . Krieger Publishing Company, 1982.
- 3[3] D. Rodney, L. Ventelon, E. Clouet, L. Pizzagalli, and F. Willaime. Ab initio modeling of dislocation core properties in metals and semiconductors. Acta Materialia , 124:633 – 659, 2017.
- 4[4] R.G. Parr and W. Yang. Density-Functional Theory of Atoms and Molecules . International Series of Monographs on Chemistry. Oxford University Press, 1989.
- 5[5] S Goedecker and G E Scuseria. Linear scaling electronic structure methods in chemistry and physics. Comput. Sci. Eng. , 5(4):14–21, July 2003.
- 6[6] F. F. Abraham, J. Q. Broughton, N. Bernstein, and E. Kaxiras. Spanning the continuum to quantum length scales in a dynamic simulation of brittle fracture. Europhysics Letters , 44:783–787, 1998.
- 7[7] G. Lu, E. B. Tadmor, and E. Kaxiras. From electrons to finite elements: A concurrent multiscale approach for metals. Physical Review B , 73:024108, 2006.
- 8[8] J. R. Kermode, T. Albaret, D. Sherman, N. Bernstein, P. Gumbsch, M. C. Payne, G. Csányi, and A. De Vita. Low-speed fracture instabilities in a brittle crystal. Nature , 455:1224–1227, 2008.
