Preconditioning and perturbative estimators in full configuration   interaction quantum Monte Carlo

Nick S. Blunt; Alex J. W. Thom; Charles J. C. Scott

arXiv:1901.06348·physics.chem-ph·May 17, 2019

Preconditioning and perturbative estimators in full configuration interaction quantum Monte Carlo

Nick S. Blunt, Alex J. W. Thom, Charles J. C. Scott

PDF

TL;DR

This paper introduces preconditioning and perturbative estimators in FCIQMC, significantly enhancing efficiency and accuracy by reducing noise and enabling larger time steps, demonstrated on benzene.

Contribution

It presents a novel combination of preconditioning with perturbative estimators in FCIQMC, improving efficiency and accuracy over previous methods.

Findings

01

Reduced statistical noise in perturbative corrections

02

Allowed larger time steps without errors

03

Achieved more accurate results for benzene

Abstract

We propose the use of preconditioning in FCIQMC which, in combination with perturbative estimators, greatly increases the efficiency of the algorithm. The use of preconditioning allows a time step close to unity to be used (without time-step errors), provided that multiple spawning attempts are made per walker. We show that this approach substantially reduces statistical noise on perturbative corrections to initiator error, which improve the accuracy of FCIQMC but which can suffer from significant noise in the original scheme. Therefore, the use of preconditioning and perturbatively-corrected estimators in combination leads to a significantly more efficient algorithm. In addition, a simpler approach to sampling variational and perturbative estimators in FCIQMC is presented, which also allows the variance of the energy to be calculated. These developments are investigated and applied to…

Tables2

Table 1. Table 1: Example improvements of E var+PT2 subscript 𝐸 var+PT2 {E_{\textrm{var+PT2}}} and E var+PT2 new superscript subscript 𝐸 var+PT2 new {E_{\textrm{var+PT2}}^{\textrm{new}}} relative to E var subscript 𝐸 var {E_{\textrm{var}}} , for a variety of systems. The frozen-core approximation and Hartree–Fock orbitals are used for molecular systems. The walker population is chosen deliberately small so that there is substantial initiator error in E var subscript 𝐸 var {E_{\textrm{var}}} . E var+PT2 subscript 𝐸 var+PT2 {E_{\textrm{var+PT2}}} and E var+PT2 new superscript subscript 𝐸 var+PT2 new {E_{\textrm{var+PT2}}^{\textrm{new}}} have almost identical accuracy, but E var+PT2 subscript 𝐸 var+PT2 {E_{\textrm{var+PT2}}} typically has smaller noise. Hubbard model calculations are performed at half filling on an 18 18 18 -site lattice, and errors here are calculated relative to FCI values. Errors for other systems are calculated relative to very accurate extrapolated benchmarks (see the main text for details). Equilibrium and stretched nuclear distances for C 2 are R = 1.24253 𝑅 1.24253 R=1.24253 Åand R = 2.0 𝑅 2.0 R=2.0 Å, respectively. a The basis set for butadiene is ANO-L-VDZP [ 3 s 2 p 1 d ] / [ 2 s 1 p ] delimited-[] 3 𝑠 2 𝑝 1 𝑑 delimited-[] 2 𝑠 1 𝑝 [3s2p1d]/[2s1p] , as used previously Daday et al. ( 2012 ); Olivares-Amaya et al. ( 2015 ); Chien et al. ( 2018 ); Guo, Li, and Chan ( 2018a ) .

		$E_{var}$	$E_{var+PT2}$		$E_{var+PT2}^{new}$
System	$N_{w}$	Error/ $m E_{h}$	Error/ $m E_{h}$	% corrected	Error/ $m E_{h}$	% corrected
C₂ (equilibrium, cc-pVQZ)	$1.75 \times 10^{5}$	2.20(5)	0.10(5)	95(4)	0.05(5)	97(4)
C₂ (stretched, cc-pVQZ)	$1.23 \times 10^{5}$	3.0(1)	0.3(1)	89(6)	0.5(1)	82(5)
Formaldehyde (aug-cc-pVDZ)	$3.0 \times 10^{5}$	4.0(1)	0.3(1)	93(4)	0.02(22)	100(6)
Formamide (cc-pVDZ)	$4.8 \times 10^{6}$	7.2(3)	0.8(3)	112(8)	0.6(4)	108(8)
Butadiene ${(22 e, 82 o)}^{a}$	$8.8 \times 10^{7}$	12.9(4)	0.4(7)	97(6)	1.0(10)	92(8)
Hubbard model ( $U / t = 2$ )	$1.1 \times 10^{4}$	4.6(1)	0.6(1)	87(4)	0.51(5)	89(3)
Hubbard model ( $U / t = 4$ )	$2.5 \times 10^{5}$	66.49(9)	23.73(9)	64.3(2)	23.7(1)	64.3(2)

Table 2. Table 2: Energies (shifted by + 231 231 +231 E h subscript 𝐸 h {E_{\textrm{h}}} ) for benzene in a cc-pVDZ basis set with a frozen core ( 30 e , 108 o ) 30 e 108 o (30\textrm{e},108\textrm{o}) . FCIQMC was performed without preconditioning ( N spawn = 1 subscript 𝑁 spawn 1 {N_{\textrm{spawn}}}=1 , Δ τ = 1.9 × 10 − 4 Δ 𝜏 1.9 superscript 10 4 \Delta\tau=1.9\times 10^{-4} au), and with preconditioning ( N spawn = 150 subscript 𝑁 spawn 150 {N_{\textrm{spawn}}}=150 , Δ τ = 0.1 Δ 𝜏 0.1 \Delta\tau=0.1 au). The FCIQMC simulations are those plotted in Fig. ( 7 ). Relative to CCSDT(Q), E ref subscript 𝐸 ref {E_{\textrm{ref}}} is too high by ∼ 23 similar-to absent 23 \sim 23 m E h m subscript 𝐸 h {\textrm{m}E_{\textrm{h}}} and E var subscript 𝐸 var {E_{\textrm{var}}} by ∼ 41 similar-to absent 41 \sim 41 m E h m subscript 𝐸 h {\textrm{m}E_{\textrm{h}}} . For E var+PT2 subscript 𝐸 var+PT2 {E_{\textrm{var+PT2}}} , the noise with N spawn = 1 subscript 𝑁 spawn 1 {N_{\textrm{spawn}}}=1 is too large for the estimate to be useful. With N spawn = 150 subscript 𝑁 spawn 150 {N_{\textrm{spawn}}}=150 , a result similar to those from CCSDT[Q] and CCSDT(Q) is obtained. Note that initiator error in E var+PT2 subscript 𝐸 var+PT2 {E_{\textrm{var+PT2}}} is not fully removed here.

Method	Estimator	Energy / $E_{h}$
CCSD(T)		-0.5813
CCSDT		-0.5817
CCSDT[Q]		-0.5826
CCSDT(Q)		-0.5845
FCIQMC	$E_{ref}$	-0.5609(3)
( $N_{spawn} = 1$ )	$E_{var}$	-0.5420(5)
	$E_{var+PT2}$	-0.597(14)
FCIQMC	$E_{ref}$	-0.5612(3)
(preconditioned,	$E_{var}$	-0.5435(5)
$N_{spawn} = 150$ )	$E_{var+PT2}$	-0.5833(10)

Equations91

∣Ψ (τ + Δ τ)⟩ = ∣Ψ (τ)⟩ - Δ τ (\hat{H} - E_{S} 1) ∣Ψ (τ)⟩,

∣Ψ (τ + Δ τ)⟩ = ∣Ψ (τ)⟩ - Δ τ (\hat{H} - E_{S} 1) ∣Ψ (τ)⟩,

C_{i} (τ + Δ τ) = C_{i} (τ) - Δ τ j \sum (H_{ij} - E_{S} δ_{ij}) C_{j} (τ) .

C_{i} (τ + Δ τ) = C_{i} (τ) - Δ τ j \sum (H_{ij} - E_{S} δ_{ij}) C_{j} (τ) .

E_{S} (τ + A Δ τ) = E_{S} (τ) - \frac{ξ}{A Δ τ} ln (\frac{N _{w} ( τ + A Δ τ )}{N _{w} ( τ )}),

E_{S} (τ + A Δ τ) = E_{S} (τ) - \frac{ξ}{A Δ τ} ln (\frac{N _{w} ( τ + A Δ τ )}{N _{w} ( τ )}),

E_{ref}

E_{ref}

= \frac{\sum _{j} H _{0 j} C _{j}}{C _{0}},

Δ τ < \frac{2}{E _{max} - E _{0}},

Δ τ < \frac{2}{E _{max} - E _{0}},

C_{n + 1} = C_{n} - γ_{n} P^{- 1} (H C_{n} - E_{n} C_{n}),

C_{n + 1} = C_{n} - γ_{n} P^{- 1} (H C_{n} - E_{n} C_{n}),

C_{i} (τ + Δ τ) = C_{i} (τ) - \frac{Δ τ}{H _{ii} - E} j \sum (H_{ij} - E δ_{ij}) C_{j} (τ) .

C_{i} (τ + Δ τ) = C_{i} (τ) - \frac{Δ τ}{H _{ii} - E} j \sum (H_{ij} - E δ_{ij}) C_{j} (τ) .

S_{i}

S_{i}

S_{i}

E_{var} = \frac{⟨ Ψ∣ H ^ ∣Ψ ⟩}{⟨ Ψ∣Ψ ⟩} .

E_{var} = \frac{⟨ Ψ∣ H ^ ∣Ψ ⟩}{⟨ Ψ∣Ψ ⟩} .

{E_{\textrm{var}}}=\frac{\big{\langle}\;\sum_{ij}C_{i}^{1}H_{ij}C_{j}^{2}\;\big{\rangle}}{\big{\langle}\;\sum_{i}C_{i}^{1}C_{i}^{2}\;\big{\rangle}},

{E_{\textrm{var}}}=\frac{\big{\langle}\;\sum_{ij}C_{i}^{1}H_{ij}C_{j}^{2}\;\big{\rangle}}{\big{\langle}\;\sum_{i}C_{i}^{1}C_{i}^{2}\;\big{\rangle}},

E_{var}

E_{var}

= \frac{\sum _{i} C _{i}^{1} [ - S _{i}^{2} /Δ τ + H _{ii} C _{i}^{2} ]}{\sum _{i} C _{i}^{1} C _{i}^{2}},

E_{var} = \frac{\sum _{i} C _{i}^{1} H _{ii} C _{i}^{2}}{\sum _{i} C _{i}^{1} C _{i}^{2}} - \frac{1}{2Δ τ} \frac{\sum _{i} [ C _{i}^{1} S _{i}^{2} + S _{i}^{1} C _{i}^{2} ]}{\sum _{i} C _{i}^{1} C _{i}^{2}} .

E_{var} = \frac{\sum _{i} C _{i}^{1} H _{ii} C _{i}^{2}}{\sum _{i} C _{i}^{1} C _{i}^{2}} - \frac{1}{2Δ τ} \frac{\sum _{i} [ C _{i}^{1} S _{i}^{2} + S _{i}^{1} C _{i}^{2} ]}{\sum _{i} C _{i}^{1} C _{i}^{2}} .

Δ E_{2} = \frac{1}{( Δ τ ) ^{2}} a \sum \frac{S _{a}^{1} S _{a}^{2}}{E - H _{aa}},

Δ E_{2} = \frac{1}{( Δ τ ) ^{2}} a \sum \frac{S _{a}^{1} S _{a}^{2}}{E - H _{aa}},

E_{var+PT2} = E_{var} + Δ E_{2} .

E_{var+PT2} = E_{var} + Δ E_{2} .

∣Φ ⟩ = [E - \hat{H}_{d}]^{- 1} \hat{H}_{off} ∣Ψ ⟩,

∣Φ ⟩ = [E - \hat{H}_{d}]^{- 1} \hat{H}_{off} ∣Ψ ⟩,

Φ_{i} = \frac{1}{E - H _{ii}} j \neq = i \sum H_{ij} C_{j},

Φ_{i} = \frac{1}{E - H _{ii}} j \neq = i \sum H_{ij} C_{j},

E_{var+PT2}^{new}

E_{var+PT2}^{new}

= \frac{⟨ Ψ∣ H ^ _{off} [ E - H ^ _{d} ] ^{- 1} H ^ ∣Ψ ⟩}{⟨ Ψ∣ H ^ _{off} [ E - H ^ _{d} ] ^{- 1} ∣Ψ ⟩} .

⟨ Φ^{1} ∣ \hat{H} ∣ Ψ^{2} ⟩

⟨ Φ^{1} ∣ \hat{H} ∣ Ψ^{2} ⟩

= i \sum \frac{[ - S _{i}^{1} /Δ τ ] [ - S _{i}^{2} /Δ τ + H _{ii} C _{i}^{2} ]}{E - H _{ii}},

= \frac{1}{( Δ τ ) ^{2}} i \sum \frac{S _{i}^{1} S _{i}^{2}}{E - H _{ii}} - \frac{1}{Δ τ} i \sum \frac{S _{i}^{1} H _{ii} C _{i}^{2}}{E - H _{ii}},

⟨ Φ^{1} ∣ Ψ^{2} ⟩

⟨ Φ^{1} ∣ Ψ^{2} ⟩

= \frac{- 1}{Δ τ} i \sum \frac{S _{i}^{1} C _{i}^{2}}{E - H _{ii}} .

σ^{2} = ⟨ Ψ∣ \hat{H}^{2} ∣Ψ ⟩ - ⟨ Ψ∣ \hat{H} ∣Ψ ⟩^{2} .

σ^{2} = ⟨ Ψ∣ \hat{H}^{2} ∣Ψ ⟩ - ⟨ Ψ∣ \hat{H} ∣Ψ ⟩^{2} .

⟨ Ψ^{1} ∣ \hat{H}^{2} ∣ Ψ^{2} ⟩

⟨ Ψ^{1} ∣ \hat{H}^{2} ∣ Ψ^{2} ⟩

= j \sum i \sum C_{i}^{1} H_{ij} k \sum H_{j k} C_{k}^{2},

\displaystyle=\sum_{j}\Big{(}[C_{j}^{1}H_{jj}+\sum_{i\neq j}C_{i}^{1}H_{ij}]

\displaystyle\;\;\;\;\;\;\;\;\times[H_{jj}C_{j}^{2}+\sum_{k\neq j}H_{jk}C_{k}^{2}]\Big{)},

= j \sum C_{j}^{1} H_{j j}^{2} C_{j}^{2}

- \frac{1}{Δ τ} j \sum [C_{j}^{1} H_{j j} S_{j}^{2} + S_{j}^{1} H_{j j} C_{j}^{2}]

+ \frac{1}{Δ τ ^{2}} j \sum S_{j}^{1} S_{j}^{2} .

Δ E_{2} = \frac{1}{( Δ τ ) ^{2}} a \sum \frac{S _{a}^{1} S _{a}^{2}}{E - H _{aa}} .

Δ E_{2} = \frac{1}{( Δ τ ) ^{2}} a \sum \frac{S _{a}^{1} S _{a}^{2}}{E - H _{aa}} .

\sigma_{\Delta E_{2}}\mathrel{\vbox{ \offinterlineskip\halign{\hfil$#$\cr\propto\cr\kern 2.0pt\cr\sim\cr\kern-2.0pt\cr}}}\frac{1}{\sqrt{N_{\textrm{contribs}}}},

\sigma_{\Delta E_{2}}\mathrel{\vbox{ \offinterlineskip\halign{\hfil$#$\cr\propto\cr\kern 2.0pt\cr\sim\cr\kern-2.0pt\cr}}}\frac{1}{\sqrt{N_{\textrm{contribs}}}},

N_{\textrm{contribs}}\mathrel{\vbox{ \offinterlineskip\halign{\hfil$#$\cr\propto\cr\kern 2.0pt\cr\sim\cr\kern-2.0pt\cr}}}({N_{\textrm{spawn}}})^{2}.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Preconditioning and perturbative estimators in full configuration interaction quantum Monte Carlo

Nick S. Blunt

[email protected]

Department of Chemistry, Lensfield Road, Cambridge, CB2 1EW, United Kingdom

Alex J. W. Thom

Department of Chemistry, Lensfield Road, Cambridge, CB2 1EW, United Kingdom

Charles J. C. Scott

Department of Chemistry, Lensfield Road, Cambridge, CB2 1EW, United Kingdom

Abstract

We propose the use of preconditioning in FCIQMC which, in combination with perturbative estimators, greatly increases the efficiency of the algorithm. The use of preconditioning allows a time step close to unity to be used (without time-step errors), provided that multiple spawning attempts are made per walker. We show that this approach substantially reduces statistical noise on perturbative corrections to initiator error, which improve the accuracy of FCIQMC but which can suffer from significant noise in the original scheme. Therefore, the use of preconditioning and perturbatively-corrected estimators in combination leads to a significantly more efficient algorithm. In addition, a simpler approach to sampling variational and perturbative estimators in FCIQMC is presented, which also allows the variance of the energy to be calculated. These developments are investigated and applied to benzene $(30\textrm{e},108\textrm{o})$ , an example where accurate treatment is not possible with the original method.

I Introduction

An important goal of quantum chemistry is the more accurate and routine treatment of strongly correlated systems. For weakly correlated systems, low-order coupled cluster (CC) theory is well motivated and extremely successfulRaghavachari et al. (1989); Bartlett and Musiał (2007), now pushing to larger systems through adaptations to reduce the scaling of the methodRiplinger and Neese (2013); Li et al. (2009); Eriksen et al. (2015). An ultimate goal is a similarly successful polynomial scaling method for strong correlation, and much work continues in this direction. There is a crucial need for better benchmarks to aid such development.

For this task, methods such as the density matrix renormalization group (DMRG) algorithmWhite (1992); Chan (2004); Olivares-Amaya et al. (2015), selected configuration interaction (SCI)Huron, Malrieu, and Rancurel (1973); Schriber and Evangelista (2016); Tubman et al. (2016); Holmes, Tubman, and Umrigar (2016); Garniron et al. (2017); Loos et al. (2018) and full configuration interaction quantum Monte Carlo (FCIQMC)Booth, Thom, and Alavi (2009); Cleland, Booth, and Alavi (2010); Spencer, Blunt, and Foulkes (2012) are important tools. Although they are relatively expensive compared to low-order CC, they are systematically improvable, and capable of providing near-exact benchmarks in regimes where other methods give unsatisfactory results. They are also useful beyond providing benchmarks, for example as complete active space (CAS) solvers in CASPT2 (CAS plus second-order perturbation theory) approachesZgid and Nooijen (2008); Ghosh et al. (2008); Thomas et al. (2015); Li Manni, Smart, and Alavi (2016); Smith et al. (2017), or in the case of DMRG, as the method of choice in 1D or quasi-1D systems. Approaches based on coupled cluster theory by including high-order clusters also show promise for this taskXu, Uejima, and Ten-no (2018).

The current FCIQMC algorithm is time limited far more than it is memory limited. On a large-scale cluster, a large FCIQMC simulation may take multiple days to run, yet use a small fraction of the memory available. Moreover, we usually encounter the situation where the final statistical error is multiple orders of magnitude smaller than systematic error [for example, see Table II of Ref. (Blunt et al., 2015a)]. This suggests that it may be possible to devise a faster FCIQMC algorithm in exchange for larger statistical noise, which would be a very desirable trade-off, and there are good reasons to believe that FCIQMC can be made substantially faster than the current algorithm.

We recently demonstrated that it is possible to calculate a second-order perturbative correction to initiator error in FCIQMCBlunt (2018). This correction can often remove over $85\%$ of initiator error in weakly correlated systems, and can be accumulated from existing information in FCIQMC, and therefore has little extra cost. However for large systems or small walker populations, we find that the associated statistical noise can be very large (the opposite situation to the noise on traditional FCIQMC estimators, described above, where statistical noise is small).

Here we propose a modified algorithm where this situation is greatly improved. Specifically, it is shown that FCIQMC can be performed with preconditioning, as commonly performed in quantum chemistry (and optimization problems generally), which allows the use of a much larger time step. This is achieved at the expense of performing multiple spawning attempts per walker, which limits the savings in computer time overall. However, this regime is highly beneficial for the calculation of the perturbative corrections, often reducing statistical noise by an order of magnitude or more, resulting in a far more efficient algorithm. As such, we show that preconditioning has limited benefits to convergence time in FCIQMC, but significantly helps the calculations of perturbatively-corrected estimators.

In addition, we introduce a more simple and efficient approach to sampling the variational energy, and also demonstrate that it is possible to sample the variance of the energy in FCIQMC, as commonly performed in variational Monte Carlo.

We recap FCIQMC in Section II. The use of preconditioning in FCIQMC is introduced in Section III, and then contrasted with the traditional approach in Section IV. A new approach to calculating estimators is discussed in Section V. Lastly, results are given in Section VI, investigating perturbative estimators and the preconditioned FCIQMC approach, with an application to benzene.

II FCIQMC

In FCIQMC the ground-state wave function is converged upon by performing imaginary-time evolutionBooth, Thom, and Alavi (2009), where the wave function $|\Psi(\tau)\rangle$ obeys

[TABLE]

where $\hat{H}$ is the Hamiltonian, $\tau$ denotes imaginary-time and $E_{S}$ is a shift which is slowly varied to control the walker population. This evolution is performed in a basis, $\{|D_{i}\rangle\}$ , in which the components of $|\Psi(\tau)\rangle=\sum_{i}C_{i}(\tau)|D_{i}\rangle$ obey

[TABLE]

In FCIQMC, as in other QMC approaches, the wave function coefficients $\bm{C}$ are sampled by a collection of walkers. If we define the number of walkers on $|D_{j}\rangle$ as $N_{j}\in\mathbb{N}$ , then the amplitude of each walker can be defined as $C_{j}/N_{j}$ . A stochastic algorithm to perform the above evolution can then be realized by the following steps:

Spawning: Loop over all occupied determinants, $|D_{j}\rangle$ . For each walker on $|D_{j}\rangle$ , choose one connected determinant, $|D_{i}\rangle$ ( $i\neq j$ and $H_{ij}\neq 0$ ), with some probability ${P_{\textrm{gen}}}(i\leftarrow j)$ . Then create a spawned walker on $|D_{i}\rangle$ with amplitude $-\Delta\tau\times(H_{ij}/{P_{\textrm{gen}}}(i\leftarrow j))\times(C_{j}/N_{j})$ . 2. 2.

Death: Loop over all occupied determinants. Each determinant $|D_{i}\rangle$ spawns to itself with amplitude $-\Delta\tau(H_{ii}-E_{S})C_{i}$ . 3. 3.

Annihilation: Sum together all current and spawned walkers on each occupied determinant to get the new coefficients, $C_{i}$ . 4. 4.

Rounding: For all determinants with an absolute amplitude, $|C_{i}|$ , less than $1$ , stochastically round the absolute amplitude down to [math] (kill the walker) with probability $1-|C_{i}|$ , or up to $1$ with probability $|C_{i}|$ .

It can be seen that the death step exactly includes the diagonal contribution to $-\Delta\tau\sum_{j}(H_{ij}-E_{S}\delta_{ij})C_{j}$ , while the spawning step corresponds to stochastically sampling off-diagonal terms. Rather than looping over all off-diagonal elements in the above summation, precisely one element is chosen for each walker, with some probability ${P_{\textrm{gen}}}(i\leftarrow j)$ . The size of the spawned amplitude must then be divided by this probability to keep the algorithm unbiased, so that the average spawned weight is correct.

The shift, $E_{S}$ , is updated slowly to oppose changes in the walker population. This is done every $A$ iterations by

[TABLE]

where $\xi$ is a damping parameter, and $N_{\textrm{w}}=\sum_{i}|C_{i}|$ is the total walker population.

Note that the above definition of the FCIQMC algorithm uses non-integer walker amplitudes, $C_{i}$ , as first suggested by Umrigar and co-workersPetruzielo et al. (2012). This differs from the original FCIQMC presentationBooth, Thom, and Alavi (2009), where integer values of $C_{i}$ were enforced. The use of non-integer coefficients improves the efficiency of the method. In the same workPetruzielo et al. (2012), Umrigar and co-workers also introduced a semi-stochastic adaptation, in which the projection operator is applied exactly within an important subspace (the deterministic or core space) and by the above stochastic algorithm otherwise, further reducing stochastic noise.

The energy is commonly estimated by

[TABLE]

where the subscript ‘[math]’ refers to the Hartree–Fock determinant or other reference state. A related estimator has been usedPetruzielo et al. (2012) where $|D_{0}\rangle$ is replaced by a multi-determinant trial wave function, which again reduces stochastic noise in the estimates.

II.1 The initiator approximation and walker blooms

The above algorithm allows the exact FCI wave function to be sampled without bias. However, in practice a population plateau appears in the simulation, below which the fermion sign problem leads to uncontrollable noiseSpencer, Blunt, and Foulkes (2012). This plateau height therefore sets a minimum memory requirement on the simulation, which is typically much smaller than that required to store the FCI space, but which nonetheless grows exponentially with the system size. As such, the FCIQMC algorithm as stated above is still restricted to small systems.

To overcome this, Cleland et al. introduced the initiator approximation to FCIQMC, known as i-FCIQMCCleland, Booth, and Alavi (2010, 2011). In this, all determinants with a weight greater than $n_{a}$ are defined as initiators (with $n_{a}$ equal to $2$ or $3$ , typically). Initiators are allowed to spawn to any determinant, while non-initiators may only spawn to already-occupied determinants. Attempted spawnings from non-initiators to unoccupied determinants are removed from the simulation. An exception occurs if two non-initiators spawn to the same determinant in the same iteration, in which case the spawnings are allowed (the ‘coherent spawning rule’). When the semi-stochastic adaptationPetruzielo et al. (2012); Blunt et al. (2015b) is used, all determinants within the deterministic space are also made initiators. We note that in some cases this deterministic space may be large, in which case the initiator error can change significantly.

This initiator approach significantly reduces the sign problem in the method, allowing arbitrarily-small walker populations to be used. In exchange, an approximation is introduced, as the Hamiltonian is effectively truncated; the above approximation is equivalent to setting Hamiltonian elements between non-initiators and unoccupied determinants to zero. As the walker population is increased, the number of initiators and occupied determinants both increase, and i-FCIQMC tends towards the exact solution. Therefore, i-FCIQMC provides a systematic way to converge to the FCI limit.

An important concept in i-FCIQMC is that of a “walker bloom”. A bloom is defined as a spawning event with weight greater than $n_{a}$ , such that the new determinant instantly becomes an initiator. Such events should be avoided, as they lead to essentially random determinants being made initiators. Furthermore, with enough bloom events we find that the initiator space grows exponentially in $\tau$ , and a sign problem returns. This criterion is often used to set the time step, $\Delta\tau$ , which is chosen so as to prevent bloom events (or to allow only a small number to occur each iteration).

III FCIQMC with preconditioning

III.1 Algorithm definition

Imaginary-time evolution as described in Section II will converge to the ground state of $\hat{H}$ only if $\Delta\tau$ is chosen to obey

[TABLE]

where $E_{\textrm{max}}$ and $E_{0}$ are the highest and lowest energy eigenvalues of $\hat{H}$ , respectively. For large systems and basis sets, we find that this condition restricts the time step to be of order $\Delta\tau\sim 10^{-3}$ au, or even smaller. Typically, FCIQMC may take on the order of $\sim 10^{3}-10^{5}$ iterations for the initial transient to decay, allowing sampling of the ground state to begin.

Preconditioning is a commonly-used approach to speed up the iterative solution of a system of linear equationsAxelsson (1994); Beauwens (2004); Saad (2011). For an eigenvalue problem $\bm{H}\bm{C}=E\bm{C}$ , an iterative solution may be obtained by

[TABLE]

where $\bm{P}$ (or often $\bm{P}^{-1}$ ) is referred to as the preconditioner, $\bm{C}_{n}$ and $E_{n}$ are best estimates of $\bm{C}$ and $E$ from iteration $n$ , and $\gamma_{n}$ is a step size. Setting $\bm{P}=\bm{I}$ and $\gamma_{n}=\Delta\tau$ returns imaginary-time propagation as in FCIQMC. However, for an appropriate choice of $\bm{P}^{-1}$ , convergence can be sped up considerably. The most common choice is the Jacobi preconditioner, defined as $P_{ij}=(H_{ii}-E)\delta_{ij}$ , which is widely used throughout quantum chemistry, such as in the Davidson methodSaad (2011). It should also be noted that the update coefficients are essentially equal to those obtained through first-order perturbation theory.

We therefore suggest the following update equation for the FCIQMC:

[TABLE]

For consistency, we have again used $\Delta\tau$ to denote the step size. However, it should be emphasized that taking the limit $\Delta\tau\to 0$ does not result in the imaginary-time Schrödinger equation, and equal values of $\Delta\tau$ do not give equal rates of convergence with and without preconditioning, so care should be taken in comparisons.

Exactly as has been done for FCIQMC with imaginary-time propagation, it is simple to write down an FCIQMC algorithm for the above preconditioned evolution:

Spawning: Loop over all occupied determinants, $|D_{j}\rangle$ , and for each walker perform $N_{\textrm{spawn}}$ spawning attempts. For each spawning attempt from $|D_{j}\rangle$ , choose one connected determinant, $|D_{i}\rangle$ ( $i\neq j$ and $H_{ij}\neq 0$ ), with some probability ${P_{\textrm{gen}}}(i\leftarrow j)$ . Then create a spawned walker on $|D_{i}\rangle$ with amplitude $-\Delta\tau\times(H_{ij}/{P_{\textrm{gen}}}(i\leftarrow j))\times(C_{j}/N_{\textrm{spawn}}N_{j})$ . 2. 2.

Apply the preconditioner to spawnings: Loop over determinants to which spawnings have occured. For a spawned walker on $|D_{i}\rangle$ , multiply its amplitude by $1/(H_{ii}-E)$ . 3. 3.

Death: Loop over all occupied determinants. Multiply each determinant’s amplitude by $1-\Delta\tau$ . 4. 4.

Annihilation: Sum together all current and spawned walkers on each occupied determinant to get the new coefficients, $C_{i}$ . 5. 5.

Rounding: For all determinants with an absolute amplitude, $|C_{i}|$ , less than $1$ , stochastically round the absolute amplitude down to [math] (kill the walker) with probability $1-|C_{i}|$ , or up to $1$ with probability $|C_{i}|$ .

Most of the algorithm is the same as for FCIQMC with imaginary-time evolution, and we will compare the two algorithms in Section IV. In particular, the annihilation and rounding steps are identical. The main differences are that spawned walkers now have the preconditioner $1/(H_{ii}-E)$ applied, and the death step is also appropriately modified (simplified, in fact) to account for this same factor. We have also chosen to allow each walker to make ${N_{\textrm{spawn}}}$ spawning attempts, so that the amplitude of each spawned walker must be divided by the same factor to keep the algorithm unbiased. Here, ${N_{\textrm{spawn}}}$ is some integer equal to $1$ or greater. In previous applications of FCIQMC, this has always been taken as ${N_{\textrm{spawn}}}=1$ .

One may ask precisely when the evolution of Eq. (8) converges. Fixed points of this evolution are when $\bm{H}\bm{C}-E\bm{C}=\bm{0}$ , as desired. Theoretically, convergence is guaranteed provided the iteration matrix has a spectral radius less than $1$ , which is met for diagonally-dominant matrices (although this is not necessary). In practice, the proposed evolution is well established as being extremely successful. We have tested this for a range of systems, including both molecular and model systems with both weak and strong correlation, and have always found convergence to occur for $\Delta\tau=0.5$ au or smaller, and often with $\Delta\tau=1.0$ au.

The use of much larger time steps has an important consequence, which must be emphasized: the size of each spawned walker is proportional to $\Delta\tau$ , so that larger spawned walkers will be created with larger time steps. Having very large spawning events (i.e., larger than $4$ or so) can significantly increase the stochastic noise in a simulation. The use of multiple spawning attempts per walker ( ${N_{\textrm{spawn}}}>1$ ) was introduced above as a way to counter this. The size of each spawned walker will be proportional to $\Delta\tau/{N_{\textrm{spawn}}}$ , giving a way to reduce the maximum spawning size by increasing ${N_{\textrm{spawn}}}$ . This will increase the cost per iteration, therefore reducing the savings of using a large $\Delta\tau$ , a point which we will return to.

We typically choose to initialize the wave function using configuration interaction singles and doubles (CISD), which is always feasible for systems currently amenable to FCIQMC. However, this is not required in general.

We note that this preconditioned approach is similar to a previous modification to FCIQMC and coupled cluster Monte CarloThom (2010); Franklin et al. (2016), changing the step taken by a quasi-Newton approachNeufeld and Thom , which though implementedHAN ; Spencer et al. (2018) has yet to be widely used. An alternative approach within a deterministic framework was considered by Zhang and EvangelistaZhang and Evangelista (2016), who considered a Chebyshev expansion of the exponential propagator.

III.2 Population control: intermediate normalization

As described in Section II, with imaginary-time evolution the walker population is typically controlled by a shift, $E_{S}(\tau)$ , which is updated by Eq. (3).

With preconditioning, a more natural choice is intermediate normalization. Consider the projected energy estimator, ${E_{\textrm{ref}}}$ , defined in Eq. (5), which is equally valid both with and without preconditioning. If we set the energy in the preconditoner to equal this estimate ( $E={E_{\textrm{ref}}}$ ) then it can be seen that $\sum_{j}(H_{0j}-E\delta_{0j})C_{j}=0$ . If evolving with Eq. (8), it is then simple to check that the coefficient $C_{0}$ on $|D_{0}\rangle$ remains exactly constant throughout. We note that $|D_{0}\rangle$ need not be the Hartree–Fock determinant, and can also be updated during a simulation to match the most populated determinant. This choice of population control does not restrict the method to weakly correlated systems.

For $C_{0}$ to remain exactly constant, the estimate $\sum_{j}H_{0j}C_{j}$ must be obtained from the spawnings made to $|D_{0}\rangle$ from the latest iteration. It is helpful to use a deterministic space containing $|D_{0}\rangle$ and its most important connections, to avoid the situation where no spawnings are made to $|D_{0}\rangle$ in an iteration.

With this choice of population control, the walker population will grow in the early iterations of the simulation, settling down and fluctuating about a final value once convergence has been achieved. This makes choosing a final walker population more difficult than in the original scheme. However, this can usually be achieved by performing a preliminary test with a small initial population, and then scaling appropriately.

The above modification has an effect on the projected energy estimator, defined in Eq. (5), which should be noted: the population $C_{0}$ now remains exactly constant, and so is not a random variable. Typically in FCIQMC, one would average the numerator and denominator of Eq. (5) separately, and perform the required division after this averaging. Now that the denominator is constant, this separate averaging makes no difference. This deserves consideration, as in the original approach this estimator can theoretically be biased if performed as $\langle x/y\rangle$ rather than $\langle x\rangle/\langle y\rangle$ (although any such bias is essentially negligible, in our experience). Does this preconditioned approach remove all such bias? We suspect that the answer is “no”, and that this theoretical bias is transferred to the sampling of $|\Psi(\tau)\rangle$ , due to the applications of $1/(H_{ii}-E)$ in the propagation (where $E$ is a random variable), and population control biasVigor et al. (2015) due to the aggressive updates to $E$ . However, we emphasize that any such bias seems to be essentially negligible in practice.

We note that this intermediate normalization approach has recently been used in a related QMC approach to coupled cluster theoryScott et al. (2019). A related approach to population control has also been used recently by Alavi and co-workers in FCIQMC with imaginary-time propagationAlavi et al. .

III.3 The initiator approximation

In preconditioned FCIQMC, the initiator adaptation is largely unchanged: initiators are defined as determinants with an absolute amplitude $|C_{i}|$ greater than $n_{a}$ , which is set to $2$ or $3$ , typically. Attempted spawnings from initiators are always accepted, but spawnings from non-initiators are only accepted if made to already-occupied determinants, else they are removed from the simulation. We again use the semi-stochastic adaptation, where all deterministic states are also defined as initiators.

We again emphasize the importance of avoiding bloom events in the initiator approximation. Given that $\Delta\tau$ can be made much larger compared to the original FCIQMC approach, it is important then to increase ${N_{\textrm{spawn}}}$ appropriately in order to avoid walker blooms. Avoiding these large spawning events is important in any QMC approach to control statistical noise, but is perhaps particularly important in the initiator adaptation, where such blooms lead to random determinants being given initiator status.

Lastly, we note that with large ${N_{\textrm{spawn}}}$ it is necessary to remove the ‘coherent spawning’ rule of the i-FCIQMC. That is, we do not allow simultaneous spawnings to an unoccupied determinant from two non-initiators to survive. For large ${N_{\textrm{spawn}}}$ , such events become a frequent occurrence, and we often encounter a sign problem re-emerging. Removing this rule has only a very small effect on the accuracy of the initiator approximation.

IV Comparison of FCIQMC with and without preconditioning

IV.1 Algorithm

Here we state the differences between the original and preconditioned FCIQMC approaches. Specifically, changes relative to the original FCIQMC algorithm are:

Once spawned walkers have been generated, the preconditioner $1/(H_{ii}-E)$ must be applied to each. 2. 2.

In the death step, a factor of $1-\Delta\tau$ is applied to each walker coefficient, rather than shifting each coefficient by $-\Delta\tau(H_{ii}-E_{S})C_{i}$ 3. 3.

The shift $E_{S}$ is replaced by a separate energy estimate, which is obtained from Eq. (5). This energy is not needed in the death step, but is instead required in the preconditioner. 4. 4.

In order to reduce the size of spawned walkers in the presence of a very large $\Delta\tau$ , we allow each walker to make ${N_{\textrm{spawn}}}$ spawning attempts. In the original approach, each walker only makes $1$ spawning attempt (i.e., ${N_{\textrm{spawn}}}=1$ ).

IV.2 Implementation

The use of preconditioning does not significantly change the implementation of FCIQMC, and only a few changes may be required to implement preconditioning in an existing FCIQMC code. In particular, all communication of spawned walkers is performed in the same manner. In both approaches, spawned walkers are held in a separate array to the current walkers. These spawned walkers are communicated to their parent process and annihilated to give a final merged spawning array, which we denote $\bm{S}$ . This spawning array may then be annihilated with the main walker list, $\bm{C}$ , to give the new walker coefficients. One can write down an expression for the expectation value of the spawning array $\bm{S}$ ,

[TABLE]

where $\hat{H}_{\textrm{off}}$ contains only off-diagonal elements in the basis set used. Importantly, $\bm{S}$ only contains off-diagonal elements of $\hat{H}$ because diagonal elements are accounted for separately by the death step. Note also that we have dropped the $\tau$ dependence in $\bm{S}(\tau)$ and $|\Psi(\tau)\rangle$ for notational clarity later. This identification of the spawning array will be crucial in Section V for constructing more efficient energy estimators in FCIQMC.

In the preconditioned case, the factors of $1/(H_{ii}-E)$ are then applied directly to the spawning array $\bm{S}$ after it has been communicated and merged across processors, but before it is merged with the previous walker, list $\bm{C}$ .

The diagonal Hamiltonian element $H_{ii}$ can be calculated for the new determinant $|D_{i}\rangle$ in $\mathcal{O}(N)$ time from the value $H_{jj}$ of the parent walker on $|D_{j}\rangle$ , which is always stored. Therefore, it is not required to perform a full $\mathcal{O}(N^{2})$ construction of $H_{ii}$ for each spawned walker, which would be expensive.

IV.3 An example: C2 cc-pVQZ

As a simple demonstration, in Fig. (1) we compare the convergence of C2 at equilibrium bond length, in a cc-pVQZ basis and with a frozen core, both with and without preconditioning. In both cases the simulation is initialized from the CISD wave function. For FCIQMC without preconditioning, we choose ${N_{\textrm{spawn}}}=1$ , and set the time step so as to prevent bloom events (giving $\Delta\tau=8\times 10^{-4}$ au), which is the standard protocol in most current FCIQMC calculations. For FCIQMC with preconditioning, we first choose a time step of $\Delta\tau=0.4$ au and then choose ${N_{\textrm{spawn}}}=200$ so as to prevent bloom events. Note that the energy estimator used here is the projected energy estimator, ${E_{\textrm{ref}}}$ , defined in Eq. (5).

It can be seen that, while FCIQMC without preconditioning requires $\sim 3\times 10^{4}$ iterations to fully converge, convergence with preconditioning is achieved within $30$ iterations. It is very important to emphasize, however, that the iteration time is roughly proportional to ${N_{\textrm{spawn}}}$ . Therefore, each iteration with ${N_{\textrm{spawn}}}=200$ is roughly $200$ times more expensive than without. Even with this taken into account, convergence is quicker with preconditioning than without, at least in this case. More careful comparison and discussion is given in Section VI.4.

V Improved estimators in FCIQMC

Separately from the above discussion of preconditioning in FCIQMC, we now discuss the calculation of improved energy estimators in FCIQMC, including perturbative corrections to initiator error. We emphasize that all of the following estimators can be calculated identically in both the original and preconditioned FCIQMC approaches. Although the following section is separate from the previous section on preconditioning in FCIQMC, we will show in Section VI.2 that the preconditioned approach greatly benefits the calculation of PT2-based estimators in FCIQMC, and so we will ultimately be interested in their application together.

V.1 Sampling variational energies without reduced density matrices

In addition to the projected energy estimators (comprising both projections onto single determinants and multi-determinant trial solutions), variational energy estimators have also been used in FCIQMCOvery et al. (2014); Blunt, Booth, and Alavi (2017):

[TABLE]

Consider the numerator. Because $|\Psi\rangle$ is a stochastic estimate, the replica trick must be used to ensure that this estimator is unbiased, as has been described elsewhereZhang and Kalos (1993); Overy et al. (2014); Blunt et al. (2014); Blunt, Alavi, and Booth (2015). In this, two independent FCIQMC simulations are performed, which we label $|\Psi^{1}\rangle=\sum_{i}C_{i}^{1}|D_{i}\rangle$ and $|\Psi^{2}\rangle=\sum_{i}C_{i}^{2}|D_{i}\rangle$ . Then the variational estimate can be obtained as

[TABLE]

where $\big{\langle}\ldots\big{\rangle}$ denotes an average over the simulation after convergence, and we assume real coefficients throughout. We drop this averaging notation for clarity, but it should be understood that the numerator and denominator of each estimator is averaged (separately) over the simulation from convergence onwards.

In previous FCIQMC studies, Eq. (12) has been calculated as $\textrm{Tr}(\hat{\Gamma}\hat{H})$ , where $\hat{\Gamma}$ is the two-particle density matrix (2-RDM), whose calculation in FCIQMC was described in Refs. (Overy et al., 2014) and (Blunt, Booth, and Alavi, 2017). The efficient implementation of 2-RDMs in FCIQMC is involved, and their accumulation can slow the simulation down by a significant factor. In this study we therefore calculate ${E_{\textrm{var}}}$ and related quantities directly.

The two main large arrays in an FCIQMC implementation are $\bm{C}$ (with components $C_{i}^{r}=\langle D_{i}|\Psi^{r}\rangle$ ) and $\bm{S}$ (with components $S_{i}^{r}=-\Delta\tau\sum_{j\neq i}H_{ij}C_{j}^{r}$ ) as defined already. $\bm{S}$ is distributed across processes with the same mapping as $\bm{C}$ , such that it is easy to take a dot product between $\bm{C}$ and $\bm{S}$ . In the following, we therefore write all estimators in terms of $\bm{C}$ and $\bm{S}$ , showing how they are efficiently calculated in practice. Again we emphasize that these arrays are constructed in the same manner for both original and preconditioned algorithms, so that all of the following applies for both approaches. In the preconditioned case, the preconditioner is applied to $\bm{S}$ only after the following estimators are constructed.

${E_{\textrm{var}}}$ may be calculated as

[TABLE]

and statistical errors can be reduced by making use of spawnings from both replicas:

[TABLE]

Note that only the expectation value of this estimator is variational. Instantaneous estimates are not.

V.2 Perturbative corrections to initiator error

Given that the variational energy estimator ${E_{\textrm{var}}}$ is based on an inexact wave function subject to the initiator approximation, we recently suggestedBlunt (2018) a second-order perturbative correction to this estimator

[TABLE]

where the summation is performed over all spawnings which are cancelled due to the initiator criterion, and there is a normalization factor of $\langle\Psi^{1}|\Psi^{2}\rangle$ . From this, a total energy estimate can be defined as

[TABLE]

The above formula for $\Delta E_{2}$ was constructed by analogy with SCI+PT2, where configuration interaction is performed within a truncated space, beyond which a second-order Epstein-Nesbet perturbative correction can be constructed. Initiator FCIQMC can also be loosely seen as a truncated method, which allows the above estimator to be written down by analogy, making use of spawned walkers which are otherwise thrown away without use. However, a truncated space for i-FCIQMC is somewhat poorly defined, first because the space of occupied and initiator determinants is non-constant, and second because some unoccupied determinants are connected to both initiators and non-initiators (as such, the truncation is more precisely on the Hamiltonian, not the space).

To make this perturbative correction more rigorous, we consider a slightly different estimator, which we call ${E_{\textrm{var+PT2}}^{\textrm{new}}}$ , which can then be compared to ${E_{\textrm{var+PT2}}}$ for any deviations. Given the wave function within the initiator approximation, $|\Psi\rangle$ , it is possible to write down a more accurate wave function which we denote $|\Phi\rangle=\sum_{i}\Phi_{i}|D_{i}\rangle$ ,

[TABLE]

where $\hat{H}_{\textrm{d}}=\sum_{i}H_{ii}|D_{i}\rangle\langle D_{i}|$ . An energy estimator based upon this improved wave function can then be written down as

[TABLE]

This expression can be expanded in terms of $\bm{C}$ and $\bm{S}$ components to give an estimator for use in FCIQMC. First the numerator,

[TABLE]

and similarly the denominator by

[TABLE]

As for ${E_{\textrm{var}}}$ , the cross terms including $C_{i}$ and $S_{i}$ can be averaged with both combinations of replicas $1$ and $2$ , both in the numerator and denominator.

It can be seen that all of the terms in ${E_{\textrm{var+PT2}}}$ are also included in ${E_{\textrm{var+PT2}}^{\textrm{new}}}$ . The connection of ${E_{\textrm{var+PT2}}^{\textrm{new}}}$ with perturbation theory is made precise in Appendix (A). However, a simple way to see this connection is to note that $|\Phi\rangle$ can be expressed as $|\Psi_{0}\rangle+|\Psi_{1}\rangle$ , where $|\Psi_{1}\rangle$ is the first-order Epstein-Nesbet correction to an appropriate zeroth-order wave function, $|\Psi_{0}\rangle$ . It is then simple to show that ${E_{\textrm{var+PT2}}^{\textrm{new}}}$ includes a second-order perturbative correction.

The estimator ${E_{\textrm{var+PT2}}^{\textrm{new}}}$ has the advantage that is requires no partitioning between a variational and non-variational space. Furthermore, ${E_{\textrm{var+PT2}}^{\textrm{new}}}$ takes the form $\langle\Phi|\hat{H}|\Psi\rangle/\langle\Phi|\Psi\rangle$ , where $|\Psi\rangle$ and $|\Phi\rangle$ are both wave functions accessible from FCIQMC. This is the form of a traditional estimator in a QMC method, and avoids explicitly adding a perturbative correction. On the other hand ${E_{\textrm{var+PT2}}}$ usually has smaller statistical noise, and so is often more useful in practice.

Note that in Eq. (16), (24) and (26), we calculate $E$ using the projected energy estimator, ${E_{\textrm{ref}}}$ . We could also use ${E_{\textrm{var}}}$ , but find that this makes little difference in practice [as $E$ is well separated from any $H_{ii}$ , particularly for Eq. (16)].

V.3 Sampling the variance of the energy

Finally, we point out that it is simple to write the energy variance, $\sigma^{2}$ , as efficient operations involving $\bm{C}$ and $\bm{S}$ , and therefore to sample in FCIQMC. Ignoring normalization,

[TABLE]

We emphasize that this is the standard energy variance, and not some measure of statistical error. The calculation of $\langle\Psi|\hat{H}|\Psi\rangle$ has been discussed already. The calculation of $\langle\Psi|\hat{H}^{2}|\Psi\rangle$ is performed (using replica sampling) as:

[TABLE]

Note that the expression for $\sigma^{2}$ involves squaring the estimate of $\langle\Psi|\hat{H}|\Psi\rangle$ . However, this operation can be performed after averaging over the simulation, such that bias is not a concern here.

The energy variance could be useful as a measure of initiator error in i-FCIQMC. It could also be used to calculate improved excitation energies in i-FCIQMC by variance matchingRobinson, Pineda Flores, and Neuscamman (2017). In previous applications of excited-state FCIQMCBlunt et al. (2015a), the same walker population was used for ground and excited states. Since excited states require larger walker populations for similar accuracy, this leads to an imbalance in accuracy between the two states. We expect that variance matching could improve this situation, and could perhaps also benefit model space QMCTen-no (2013); Ohtsuka and Ten-no (2015) in the same way.

Figure (2) shows convergence of $\sigma^{2}$ with iteration number (with preconditioning and $\Delta\tau=1.0$ au) and walker population per replica ( ${N_{\textrm{w}}}$ ) for the Hubbard model at $U/t=4$ , on a periodic two-dimensional $18$ -site lattice at half-filling. The lattice is the same as that presented in Supplemental Material of Ref. (Blunt, Alavi, and Booth, 2015). As expected, $\sigma^{2}$ tends to [math] as the walker population is increased and initiator error removed.

VI Results

The results are structured as follows. Example results for the perturbatively-corrected estimators are presented in Section VI.1. In Section VI.2 it is shown that the efficiency of such estimators is greatly increased by performing multiple spawning attempts per walker (large ${N_{\textrm{spawn}}}$ ). The effect of correlation of QMC data on perturbative corrections is discussed in Section VI.3, and the convergence time of FCIQMC with preconditioning is considered in Section VI.4. Finally, we show application to a larger example, benzene.

All molecular geometries are presented in supporting information. The geometry of formamide and benzene were taken from Ref. (Schreiber et al., 2008). The geometry for butadiene was taken from Ref. (Daday et al., 2012).

The initiator threshold $n_{a}$ was set to $3.0$ for all systems except for C2, where it was set to $2.0$ (for consistency with results in Ref. [Blunt et al., 2015a]).

The preconditioned approach was implemented in NECINEC , which was used for all FCIQMC results. Integral files were generated with PySCFSun et al. (2017). CC benchmarks were obtained with MRCCKállay et al. (2013); Bomble et al. (2005); Kállay and Gauss (2005). SCI+PT2 benchmarks were obtained using the SHCI approachHolmes, Tubman, and Umrigar (2016); Sharma et al. (2017) with DiceDic .

VI.1 Results for perturbative corrections to initiator error

Table 1 shows examples of the correction made by ${E_{\textrm{var+PT2}}}$ and ${E_{\textrm{var+PT2}}^{\textrm{new}}}$ relative to ${E_{\textrm{var}}}$ , for a variety of systems. Walker populations are chosen so that substantial initiator error exists in ${E_{\textrm{var}}}$ . Hubbard model calculations are performed at half-filling on the same lattice as used in Fig. (2). Hartree–Fock orbitals were used for molecular systems. Each error is calculated relative to either the exact FCI energy (for Hubbard model examples) or a very accurate extrapolated estimate (for molecular examples). Benchmarks for C2 are extrapolated SCI+PT2 values from Ref. (Holmes, Umrigar, and Sharma, 2017). We also obtained benchmarks for formaldehyde and formamide using extrapolated SCI+PT2 (for formamide, these SCI+PT2 calculations used orbitals optimized by performing active-active rotations in an SHCI calculation with a threshold of $\epsilon=2\times 10^{-4}$ , as described in Ref. [Smith et al., 2017]). The benchmark for butadiene is an extrapolated DMRG+PT2 result of $-155.557567$ ${E_{\textrm{h}}}$ from Ref. (Guo, Li, and Chan, 2018a).

The molecular systems considered are weakly correlated and so the PT2 correction is expected to be effective, which is found to be the case. The correction here is typically $>85\%$ , as was found in Ref. (Blunt, 2018). The correction is less effective for the Hubbard model as the coupling strength is increased.

Results for ${E_{\textrm{var+PT2}}}$ and ${E_{\textrm{var+PT2}}^{\textrm{new}}}$ are seen to be essentially identical within error bars. This is expected for the reasons discussed in Section V.2. However, the statistical error on ${E_{\textrm{var+PT2}}}$ is usually smaller than that on ${E_{\textrm{var+PT2}}^{\textrm{new}}}$ , so that ${E_{\textrm{var+PT2}}}$ is generally preferable (although some exceptions occur, particularly for model systems, as seen for the Hubbard model at $U/t=2$ in Table 1). ${E_{\textrm{var+PT2}}}$ has the disadvantage that its derivation involves a somewhat poorly-defined definition of a zeroth-order space within the initiator approximation. In practice, however, it gives essentially identical results to ${E_{\textrm{var+PT2}}^{\textrm{new}}}$ with a smaller noise.

It is also interesting to consider the convergence of each estimator in a simulation. An example is shown in Fig. (3) for formamide in a cc-pVDZ basis and with a frozen core $(18\textrm{e},54\textrm{o})$ . Preconditioning was used with parameters $\Delta\tau=0.03$ au and ${N_{\textrm{spawn}}}=60$ . The walker population was initialized from $10^{6}$ and grew to a final value of $6.3\times 10^{7}$ . It is found that ${E_{\textrm{ref}}}$ converges more slowly than ${E_{\textrm{var}}}$ . Note also that ${E_{\textrm{var+PT2}}}$ is equal to ${E_{\textrm{var}}}$ at initialization. This is because this definition of the PT2 correction only has contributions from spawnings cancelled due to the initiator criterion. All walkers are initialized within the deterministic space and therefore are initiators, and so the PT2 correction as defined in Eq. (16) is initially [math]. Meanwhile ${E_{\textrm{var+PT2}}^{\textrm{new}}}$ initializes from a much lower energy since it takes the form $\langle\Phi|\hat{H}|\Psi\rangle/\langle\Phi|\Psi\rangle$ , where $|\Phi\rangle$ is immediately a much better estimate than $|\Psi\rangle$ . However, both ${E_{\textrm{var+PT2}}^{\textrm{new}}}$ and ${E_{\textrm{var+PT2}}}$ converge to the same value once the simulation has equilibrated.

VI.2 Statistical error on perturbative corrections

Although ${E_{\textrm{var+PT2}}}$ and ${E_{\textrm{var+PT2}}^{\textrm{new}}}$ typically have a much smaller systematic (initiator) error than ${E_{\textrm{ref}}}$ and ${E_{\textrm{var}}}$ , they tend to have a much larger statistical error (noise). This is sometimes manageable, but becomes severe for large systems and small walker populations. To see why, consider the PT2 correction as it appears in ${E_{\textrm{var+PT2}}}$ :

[TABLE]

The summation is over all spawnings cancelled due to the initiator criteria. A similar term appears in estimators ${E_{\textrm{var+PT2}}^{\textrm{new}}}$ and $\sigma^{2}$ (where the summation is performed over all spawnings, which does not affect the following argument).

The space sampled by the spawnings $S^{1}_{a}$ and $S^{2}_{a}$ contains up to double excitations from the occupied space, which is very large in general. Because replica sampling is required, a contribution to $\Delta E_{2}$ can only be made if spawnings from both replicas occur to the same determinant in the same iteration. As the space sampled becomes larger, or the number of spawned walkers becomes smaller, this becomes increasingly rare.

The preconditioned approach here allows one to perform fewer iterations ( ${N_{\textrm{iterations}}}$ ) with a larger number of spawning attempts per walker ( ${N_{\textrm{spawn}}}$ ). It can be shown that this approach leads to smaller noise on ${E_{\textrm{var+PT2}}}$ , ${E_{\textrm{var+PT2}}^{\textrm{new}}}$ and $\sigma^{2}$ , and improved efficiency overall. This can be seen by the following argument. Roughly, we expect the statistical error on an estimator such as $\Delta E_{2}$ to obey

[TABLE]

where $N_{\textrm{contribs}}$ is the number of contributions to an estimate. Since a contribution is made only if two spawnings occur to the same determinant from two independent replicas, the number of contributions is roughly proportional to the density of spawnings,

[TABLE]

This is an upper limit which will become less accurate as the space spawned to becomes saturated, i.e. for large ${N_{\textrm{spawn}}}$ or a small number of orbitals. Assuming this holds, then

[TABLE]

However, for a total real simulation time $T$ the number of iterations performed scales as ${N_{\textrm{iterations}}}\mathrel{\vbox{ \offinterlineskip\halign{\hfil$ # $\cr\propto\cr\kern 2.0pt\cr\sim\cr\kern-2.0pt\cr}}}T/{N_{\textrm{spawn}}}$ . Since $\Delta E_{2}$ is averaged over all iterations, we also have $\sigma_{\Delta E_{2}}\propto 1/\sqrt{{N_{\textrm{iterations}}}}$ . So for a constant simulation time $T$ , as ${N_{\textrm{spawn}}}$ is increased,

[TABLE]

and the efficiency (with respect to estimation of $\Delta E_{2}$ ) follows

[TABLE]

Therefore, performing multiple spawning attempts per walker provides one way to greatly reduce the error on ${E_{\textrm{var+PT2}}}$ , ${E_{\textrm{var+PT2}}^{\textrm{new}}}$ and $\sigma^{2}$ . It should be emphasized that the following argument holds in FCIQMC both with and without preconditioning. However, preconditioning allows $\Delta\tau$ to be increased such that using a large value of ${N_{\textrm{spawn}}}$ will not lead to slow convergence or a long autocorrelation time, which is critical. Therefore, preconditioning with large values of ${N_{\textrm{spawn}}}$ and $\Delta\tau$ leads to a far more efficient algorithm overall.

Fig. (4) demonstrates the scaling of the statistical error estimate on ${E_{\textrm{var+PT2}}^{\textrm{new}}}$ for three systems: C2, cc-pVQZ at equilibrium bond length; Water, cc-pVQZ at equilibrium geometry; Ne, cc-pV5Z. Core electrons are frozen for each system. In each case ${N_{\textrm{spawn}}}$ is increased while holding ${N_{\textrm{spawn}}}\times{N_{\textrm{iterations}}}$ constant and also holding $\Delta\tau\times{N_{\textrm{iterations}}}$ constant (so that the final value of $\tau$ is fixed, and the total simulation time is approximately fixed). The reference population is held fixed as ${N_{\textrm{spawn}}}$ is increased, leading to final walker populations that are very similar. It is seen that increasing ${N_{\textrm{spawn}}}$ does indeed reduce the noise on ${E_{\textrm{var+PT2}}^{\textrm{new}}}$ . For example, with ${N_{\textrm{spawn}}}=1$ the error estimate for C2 is almost $5$ ${\textrm{m}E_{\textrm{h}}}$ , which is reduced to $0.6$ ${\textrm{m}E_{\textrm{h}}}$ with ${N_{\textrm{spawn}}}=100$ , and the scaling of Eq. (36) is approximately followed. This scaling is less accurate for Ne, where the number of orbitals is smaller (and so the space spawned to is smaller) and becomes saturated with spawned walkers more quickly.

VI.3 Autocorrelation length on estimators

Although ${E_{\textrm{var+PT2}}}$ and ${E_{\textrm{var+PT2}}^{\textrm{new}}}$ have larger noise than ${E_{\textrm{ref}}}$ and ${E_{\textrm{var}}}$ , they have a significant advantage regarding the correlation of QMC data. This is demonstrated in Fig. (5) where the two systems studied are C2 and water, as defined in Section VI.2. To investigate the correlation of each estimator, we average each simulation into blocks of increasing length, and perform an uncorrelated error analysis using these blocks. This is simply the reblocking procedure, as described by Flyvbjerg and PetersenFlyvbjerg and Petersen (1989). Note that for the simulations of C2 and water, we took a total of $2^{18}$ and $2^{19}$ iterations to average over, respectively. Therefore even with a block length of $2^{14}$ , we used $2^{4}$ or $2^{5}$ data points to construct error estimates, to ensure that these estimates are reliable.

If the data is correlated then the error estimate grows with increasing block length, eventually plateauing when subsequent blocks become approximately uncorrelated. This effect is seen to be most significant for ${E_{\textrm{ref}}}$ , where for water performing an uncorrelated analysis gives an error estimate of $2.1\times 10^{-6}$ ${E_{\textrm{h}}}$ , compared to a more realistic estimate of $1.2\times 10^{-4}$ ${E_{\textrm{h}}}$ . An uncorrelated analysis of ${E_{\textrm{var}}}$ gives an error estimate of $1.1\times 10^{-4}$ ${E_{\textrm{h}}}$ compared to an accurate estimate of $2.0\times 10^{-4}$ ${E_{\textrm{h}}}$ , a much smaller but still non-negligible difference. We observe similar behavior across all systems investigated: an uncorrelated analysis typically underestimates the statistical error on ${E_{\textrm{var}}}$ by a factor of $\sim 2$ , while for ${E_{\textrm{ref}}}$ this factor is typically much larger.

For ${E_{\textrm{var+PT2}}}$ and ${E_{\textrm{var+PT2}}^{\textrm{new}}}$ , the error estimate remains roughly constant as the block length is increased, indicating that data is approximately uncorrelated. We observe this across all systems studied. This is helpful, as a reliable error estimate on ${E_{\textrm{var+PT2}}}$ and ${E_{\textrm{var+PT2}}^{\textrm{new}}}$ may be obtained after a relatively small number of converged iterations. We suspect the reason for this is that the error on ${E_{\textrm{var+PT2}}}$ and ${E_{\textrm{var+PT2}}^{\textrm{new}}}$ is dominated by the term such as that in Eq. (16), involving a weighted dot product across the two spawning arrays. Although the FCIQMC wave function is heavily correlated from iteration to iteration, spawned walkers are essentially uncorrelated from each other. They are only correlated through their underlying dependence on the FCIQMC wave function, which should approximately cancel out in the denominator of the estimator. This seems to be very accurate based on our observations across many systems, although we would expect this observation to be only approximate theoretically. For example, ${E_{\textrm{var+PT2}}}$ is formed as the sum of ${E_{\textrm{var}}}$ and the PT2 correction; clearly an uncorrelated analysis is not exact for the former estimate (though ${E_{\textrm{var}}}$ typically has a much smaller error than the PT2 term, perhaps explaining why this is not noticeable). We would therefore still recommend a protocol of performing many FCIQMC iterations when possible, but the situation is dramatically improved compared to that for ${E_{\textrm{ref}}}$ .

Note that the above arguments do not depend using a large time step or preconditioning. A time step of $\Delta\tau=10^{-3}$ au was used for both examples in Fig. (5). This small autocorrelation length on ${E_{\textrm{var+PT2}}}$ and ${E_{\textrm{var+PT2}}^{\textrm{new}}}$ is a property of the estimators themselves, and not the use of preconditioning.

VI.4 Convergence time in the preconditioned approach

As demonstrated in Figs. (1) and (2), the use of preconditioning allows a large time step to be used in FCIQMC. Typically one can set $\Delta\tau=0.5$ au and achieve convergence without issue, which usually allows convergence within $20$ - $30$ iterations in our experience. Meanwhile, the original algorithm usually requires at least several thousand iterations to converge, and sometimes many more.

However, as discussed in Section III.3, setting a large time step also requires setting ${N_{\textrm{spawn}}}$ to be very large. The simulation time in FCIQMC is dominated by the generation and processing of spawned walkers, such that iteration time is roughly proportional to ${N_{\textrm{spawn}}}$ . So for a fair comparison, we should instead look at convergence speed as a function of ${N_{\textrm{iterations}}}\times{N_{\textrm{spawn}}}$ , rather than ${N_{\textrm{iterations}}}$ .

Another requirement for a fair comparison is that the time step should be chosen in a consistent manner. For this, we use the automatic system for choosing $\Delta\tau$ , implemented in NECI. As discussed already, it is important that there are few bloom events, defined as an event where a spawned walker is created with weight greater than the initiator threshold ( $n_{a}$ ). Allowing a large number of bloom events can greatly increase statistical noise and lower the efficiency of the algorithm. The automatic system in NECI looks for the largest bloom event from the previous iteration (if any), and reduces $\Delta\tau$ so that this spawning will have weight less than $n_{a}$ in future occurrences. The time step reaches a final value during convergence.

In Fig. (6), convergence is considered for the same system as in Fig. (1) (C2, cc-pVQZ, equilibrium geometry), but plotted against ${N_{\textrm{iterations}}}\times{N_{\textrm{spawn}}}$ . With preconditioning we take parameters ${N_{\textrm{spawn}}}=200$ and an initial time step of $\Delta\tau=0.5$ au. For the original algorithm, we take ${N_{\textrm{spawn}}}=1$ and an initial time step of $\Delta\tau=0.0025$ au (so that the initial value of $\Delta\tau/{N_{\textrm{spawn}}}$ is consistent). It can be seen that the benefit of preconditioning is now rather limited. In this case, the use of preconditioning speeds up convergence by only a small factor. We have tested this across a range of systems (including various basis set sizes, and both equilibrium and stretched regimes), and find that convergence is typically very similar between the two algorithms, by this metric.

It is important to understand why this is. Clearly, preconditioning is well established as improving the convergence rate considerably. As a function of number of iterations, convergence is greatly sped up in FCIQMC. However in this stochastic setting the cost of each iteration scales strongly with the step size. This is dictated by the need to avoid large bloom events, to prevent large noise. Therefore, it is important to investigate bloom events more carefully. The unsigned weight of a spawned walker in the original algorithm is proportional to $\frac{1}{{P_{\textrm{gen}}}(j\leftarrow i)}|H_{ji}|$ , where ${P_{\textrm{gen}}}(j\leftarrow i)$ is the probability of choosing determinant $|D_{j}\rangle$ , given spawning from $|D_{i}\rangle$ . With preconditioning this becomes proportional to $\frac{1}{{P_{\textrm{gen}}}(j\leftarrow i)}\,|\frac{H_{ji}}{E-H_{jj}}|$ . Therefore the choice of ${P_{\textrm{gen}}}(j\leftarrow i)$ is critical in determining the number of bloom events. In the original algorithm, the best choice of ${P_{\textrm{gen}}}(j\leftarrow i)$ (allowing the maximum $\Delta\tau$ without bloom events) is given by

[TABLE]

It is far too expensive to achieve this distribution exactly, but several schemes have been proposed to achieve this approximately. These include the heat bath approach of Holmes *et al.*Holmes, Changlani, and Umrigar (2016), and approaches based on the Cauchy-Schwarz inequality (suggested by Alavi and co-workersSmart, Booth, and Alavi and investigated recently by Neufeld and ThomNeufeld and Thom (2019)). For preconditioned FCIQMC, the optimal choice of ${P_{\textrm{gen}}}(j\leftarrow i)$ will be

[TABLE]

Therefore, optimal preconditioning requires a very different excitation generator to the original approach. In this study we have used Cauchy-Schwarz-based excitation generators implemented in NECI, designed to approximately achieve Eq. (38), and so the above comparison gives a significant advantage to the original scheme. To see the problem, consider the simple example of water in a cc-pVDZ basis set with a frozen core. In this case, the correlation energy is $-0.215$ ${E_{\textrm{h}}}$ , and so any walker spawned to the HF determinant is amplified by a factor of $\sim 5$ by the preconditioner. Meanwhile, the largest value of $|E-H_{jj}|$ from a test simulation was $\sim 23$ . Therefore the ratio of largest to smallest value of $|E-H_{jj}|$ is $\sim 100$ , and ideally we would like to make spawning to low-energy determinants $\sim 10$ times more likely, and spawning to high-energy determinants $\sim 10$ times less likely, relative to the current scheme. Doing so would allow the time step to be larger, and therefore convergence and autocorrelation times shorter, by this same factor (which will be system dependent). Alternatively, one could keep the time step fixed and reduce ${N_{\textrm{spawn}}}$ by this factor, which would be particularly useful when the correlation length is short, or $\Delta\tau$ close to $1$ already.

Therefore, there is substantial potential to speed up the FCIQMC algorithm by modifying excitation generators for the preconditioned case. The design and optimization of excitation generators is an extensive task, which we do not consider here. Nonetheless, there is clearly a benefit to be gained in future work through this approach.

Lastly, in the above analysis we assumed that the iteration time scaled proportionally to ${N_{\textrm{spawn}}}$ . Actually, a large value of ${N_{\textrm{spawn}}}$ is often more efficient than this. This is because some parts of the algorithm (such as the death step and deterministic projection) are independent of ${N_{\textrm{spawn}}}$ . For example, for benzene as studied in Section VI.5, the average iteration time divided by ${N_{\textrm{spawn}}}$ is equal to 0.56 seconds without preconditioning and ${N_{\textrm{spawn}}}=1$ , while with preconditioning and ${N_{\textrm{spawn}}}=150$ this value is equal to 0.41 seconds (with ${N_{\textrm{w}}}=1.28\times 10^{7}$ in both cases). There are further ways in which large- ${N_{\textrm{spawn}}}$ FCIQMC can be made more efficient, as discussed in the conclusion.

VI.5 Benzene

As an application of this approach to a larger system, we consider benzene in a cc-pVDZ basis set with a frozen core $(30\textrm{e},108\textrm{o})$ , using the geometry of Ref. (Schreiber et al., 2008). This is an example that would have been too challenging to study accurately with FCIQMC previously, even with significant computational resources, and so provides a good test.

Without preconditioning, parameters $\Delta\tau=1.9\times 10^{-4}$ au and ${N_{\textrm{spawn}}}=1$ are chosen. With preconditioning, we take $\Delta\tau=0.1$ au and ${N_{\textrm{spawn}}}=150$ . Both simulations used $1.28\times 10^{7}$ walkers and were run for $11$ hours on $10$ $32$ -core nodes, with $384$ GB of RAM per node. These resources are modest compared to large-scale FCIQMC, which can be scaled up to more than $10^{9}$ walkers and $\sim 10^{4}$ CPU cores with appropriate load balancingSpencer et al. (2018). Fig. (7) presents the convergence of ${E_{\textrm{ref}}}$ , ${E_{\textrm{var}}}$ and ${E_{\textrm{var+PT2}}}$ in both approaches. Table 2 presents final estimates, averaged from convergence onwards, and compared to high-order coupled cluster.

Initiator error (relative to CCSDT(Q)) on ${E_{\textrm{ref}}}$ and ${E_{\textrm{var}}}$ is roughly unchanged by the use of preconditioning. The estimate from ${E_{\textrm{ref}}}$ is too high by $\sim 23$ ${\textrm{m}E_{\textrm{h}}}$ , while the estimate from ${E_{\textrm{var}}}$ is too high by $\sim 41$ ${\textrm{m}E_{\textrm{h}}}$ . With ${N_{\textrm{spawn}}}=1$ , the noise on ${E_{\textrm{var+PT2}}}$ is too large to be useful. This is clear from Fig. (7), where fluctuations from iteration to iteration are of size $\sim$ $20{E_{\textrm{h}}}$ . The final averaged value in Table 2 has a stochastic error of $14$ ${\textrm{m}E_{\textrm{h}}}$ , and no reliable conclusion can be made. Using ${N_{\textrm{spawn}}}=150$ , the noise is reduced substantially. Sensible convergence is seen, and the final estimate from ${E_{\textrm{var+PT2}}}$ has an statistical error of $1$ ${\textrm{m}E_{\textrm{h}}}$ . This energy is between the CCSDT[Q] and CCSDT(Q) results. Continuing the preconditioned simulation for a further $11$ hours increases the ${E_{\textrm{var+PT2}}}$ estimate to $-231.5825(7)$ ${E_{\textrm{h}}}$ . Therefore, we suspect that the true ${E_{\textrm{var+PT2}}}$ estimate (in the limit of zero statistical error) is slightly higher than that given in Table 2. The results here are not intended to be accurately converged FCI benchmarks, but estimates to assess the improvement made by ${E_{\textrm{var+PT2}}}$ and the large- ${N_{\textrm{spawn}}}$ approach. In this respect the approach described here makes a significant improvement over the previous method.

VII Conclusion

It has been demonstrated that FCIQMC can be performed with a preconditioner, in contrast to the traditional imaginary-time propagation, allowing time steps close to unity to be used. This results in a method which can typically converge within $20$ - $30$ iterations, while the original method typically requires at least several thousand iterations. In practice, the requirement that bloom events be avoided means that a large ${N_{\textrm{spawn}}}$ must also be chosen. As a result, reductions in simulation time to convergence are rather more limited. This can be traced to the fact that currently-used excitation generators are optimized for imaginary-time propagation, and must be modified in the presence of a preconditioner. This will be an area for future work, and could greatly improve the speed of the method.

However, it has been shown that the use of a large ${N_{\textrm{spawn}}}$ is a dramatic benefit for the calculations of perturbative corrections to initiator error. Such perturbative corrections improve the accuracy of the method dramatically, yet are almost free to calculate from rejected spawned walkers, so that we regard this as a clear improvement to FCIQMC. These improvements have been demonstrated for benzene $(30\textrm{e},108\textrm{o})$ , which is certainly not feasible with the original i-FCIQMC algorithm, and where the PT2 correction was too noisy in the previous approach. Thus, while the preconditioned approach does not speed up convergence as one might expect, it is a significant benefit in the calculation of PT2 corrections to initiator error.

In practice, we also find that performing multiple spawning attempts per walker is more efficient in terms of iteration time per spawned walker. In future work, there are obvious ways in which the algorithm could be made more efficient in the large- ${N_{\textrm{spawn}}}$ case. For example:

•

In semi-stochastic FCIQMC, the deterministic projection is performed once per iteration. When performing a large number of cheap iterations (small $\Delta\tau$ and ${N_{\textrm{spawn}}}$ ) this projection becomes too expensive beyond a deterministic space size of order $\sim 10^{5}$ or so. Using a small number of expensive iterations (large $\Delta\tau$ and ${N_{\textrm{spawn}}}$ ), a much larger deterministic space could be used without this projection becoming the limiting cost.

•

The use of large ${N_{\textrm{spawn}}}$ should make it more efficient to perform more of the algorithm deterministically, beyond the semi-stochastic approach. For example, for a determinant with $|C_{i}|$ walkers, it may be more efficient to generate all connected determinants and create spawnings accordingly, rather than calling the excitation generator $|C_{i}|\times{N_{\textrm{spawn}}}$ times, particularly if this number is similar to or larger than the number of connected determinants (as is sometimes the case).

Combined with an excitation generator optimized for the preconditioned algorithm, such modifications could lead to a much faster algorithm. The use of perturbative estimators already allows a new range of systems to be studied accurately by the method. The implementation of these additional developments should allow the method to push further still in the near future.

Acknowledgements.

N.S.B is grateful to St John’s College, Cambridge for funding and supporting this work through a Research Fellowship. A.J.W.T. thanks the Royal Society for a University Research Fellowship under grant UF160398. C.J.C.S is grateful to the Sims Fund for a studentship. This study made use of the CSD3 Peta4 CPU cluster.

Appendix A A more rigorous PT2 correction to initiator error

As discussed in the text, a second-order perturbative correction to initiator error can be calculated by

[TABLE]

where the summation is performed over all spawnings which are cancelled due to the initiator criterion, and there is a normalization factor of $\langle\Psi^{1}|\Psi^{2}\rangle$ . This was motivated by selected CI methods. In such approaches, the Hamiltonian is diagonalized exactly in a variational subspace, allowing a perturbative correction to be calculated with an Epstein-Nesbet partitioning, and an analogous derivation was usedBlunt (2018) to obtain Eq. (16). However, the correction to i-FCIQMC is less rigorous because it is not simple to define a zeroth order space, and not clear that the FCIQMC wave function exactly samples the corresponding ground state.

To make the correction more rigorous, we present a second-order perturbative correction without considering a truncated space. This is the same approach used recently by Guo, Li and ChanGuo, Li, and Chan (2018b) and also by SharmaSharma (2018) to perturbatively correct a DMRG wave function, and we follow their idea.

Consider partitioning the Hamiltonian so that $\hat{H}=\hat{H}_{0}+\hat{V}$ and $\hat{H}_{0}|\Psi\rangle=E_{0}|\Psi\rangle$ , where $|\Psi\rangle$ will be taken as the FCIQMC wave function (dropping the conventional [math] subscript), and define $E_{0}=\langle\Psi|\hat{H}|\Psi\rangle$ . The second-order energy correction can be obtained as

[TABLE]

where $\hat{Q}=\mathbb{1}-|\Psi\rangle\langle\Psi|$ . An appropriate $\hat{H}_{0}$ can be defined by

[TABLE]

where $\hat{P}=|\Psi\rangle\langle\Psi|$ and $\langle D_{i}|\hat{H}_{d}|D_{j}\rangle=\delta_{ij}\langle D_{i}|\hat{H}|D_{j}\rangle$ consists of the diagonal elements of $\hat{H}$ in the FCIQMC basis.

It can be shown that $\hat{Q}\hat{V}|\Psi\rangle=(\hat{H}-E_{0})|\Psi\rangle$ , and so

[TABLE]

To proceed, one can make the approximation

[TABLE]

to avoid calculating the inverse of $[\hat{H}_{0}-E_{0}]$ , which is not diagonal in the FCIQMC basis. This will be a very good approximation in this case, as $(\hat{H}-E_{0})|\Psi\rangle$ will be approximately zero in the space of occupied determinants. This inverse can be treated more carefully, as in Ref. (Guo, Li, and Chan, 2018b), but the above is more than sufficient for the FCIQMC case.

To put this in a form similar to that found for ${E_{\textrm{var+PT2}}^{\textrm{new}}}$ , we split the Hamiltonian into diagonal and off-diagonal components, $\hat{H}=\hat{H}_{d}+\hat{H}_{\textrm{off}}$ . Then,

[TABLE]

where we used the definition of the zeroth-order energy, $\langle\Psi|(\hat{H}-E_{0})|\Psi\rangle=0$ . Finally, using $E_{0}=\langle\Psi|\hat{H}_{d}+\hat{H}_{\textrm{off}}|\Psi\rangle$ ,

[TABLE]

The spawned vector in FCIQMC can be written $S_{i}=-\Delta\tau\langle D_{i}|\hat{H}_{\textrm{off}}|\Psi\rangle$ , giving a final expression for the perturbatively corrected energy estimator in FCIQMC:

[TABLE]

We see that this expression includes the all terms in the perturbative correction of Eq. (16), and is almost identical to the expression for $\langle\Phi|\hat{H}|\Psi\rangle$ obtained in Eq. (24). The only difference is in the final term:

[TABLE]

where we have used the definition of $\Phi_{i}$ in Eq. (19), and included the required normalization factors. If $|\Psi\rangle$ is approximately an eigenstate of $\hat{H}$ , then $C_{i}\approx\Phi_{i}$ , and the two estimators will give very similar results. Indeed, we find in practice that results from the two estimators are essentially identical after convergence.

Bibliography66

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Raghavachari et al. (1989) K. Raghavachari, G. W. Trucks, J. A. Pople, and M. Head-Gordon, Chemical Physics Letters 157 , 479 (1989).
2Bartlett and Musiał (2007) R. J. Bartlett and M. Musiał, Rev. Mod. Phys. 79 , 291 (2007).
3Riplinger and Neese (2013) C. Riplinger and F. Neese, J. Chem. Phys. 138 , 034106 (2013).
4Li et al. (2009) W. Li, P. Piecuch, J. R. Gour, and S. Li, J. Chem. Phys. 131 , 114109 (2009).
5Eriksen et al. (2015) J. J. Eriksen, P. Baudin, P. Ettenhuber, K. Kristensen, T. Kjærgaard, and P. Jørgensen, J. Chem. Theory Comput. 11 , 2984 (2015).
6White (1992) S. R. White, Phys. Rev. Lett. 69 , 2863 (1992).
7Chan (2004) G. K.-L. Chan, J. Chem. Phys. 120 , 3172 (2004).
8Olivares-Amaya et al. (2015) R. Olivares-Amaya, W. Hu, N. Nakatani, S. Sharma, J. Yang, and G. K.-L. Chan, J. Chem. Phys. 142 , 034102 (2015).