Fast optical absorption spectra calculations for periodic solid state   systems

F. Henneke; L. Lin; C. Vorwerk; C. Draxl; R. Klein; C. Yang

arXiv:1907.02827·physics.comp-ph·June 24, 2020

Fast optical absorption spectra calculations for periodic solid state systems

F. Henneke, L. Lin, C. Vorwerk, C. Draxl, R. Klein, C. Yang

PDF

1 Repo

TL;DR

The paper introduces an efficient method for calculating optical absorption spectra in periodic solids by approximating the Bethe-Salpeter Hamiltonian, significantly reducing computational costs while maintaining accuracy.

Contribution

It develops a novel interpolative separable density fitting technique to construct the Bethe-Salpeter Hamiltonian with near-linear scaling, enabling faster spectrum calculations.

Findings

01

Achieves nearly linear scaling in Hamiltonian construction.

02

Reduces computation time from hours to minutes for large systems.

03

Maintains accuracy comparable to brute force methods.

Abstract

We present a method to construct an efficient approximation to the bare exchange and screened direct interaction kernels of the Bethe-Salpeter Hamiltonian for periodic solid state systems via the interpolative separable density fitting technique. We show that the cost of constructing the approximate Bethe-Salpeter Hamiltonian scales nearly optimally as $O (N_{k})$ with respect to the number of samples in the Brillouin zone $N_{k}$ . In addition, we show that the cost for applying the Bethe-Salpeter Hamiltonian to a vector scales as $O (N_{k} lo g N_{k})$ . Therefore the optical absorption spectrum, as well as selected excitation energies can be efficiently computed via iterative methods such as the Lanczos method. This is a significant reduction from the $O (N_{k}^{2})$ and $O (N_{k}^{3})$ scaling associated with a brute force approach for constructing the Hamiltonian…

Tables2

Table 1. Table 1: Parameters used in the computation of spectra and the benchmarks.

Parameters	Diamond	Graphene
$N_{v}$	$4$	$4$
$N_{c}$	$10$	$5$
$N_{k}$	$13 \times 13 \times 13$	$42 \times 42 \times 1$
$N_{r}$	$20 \times 20 \times 20$	$15 \times 15 \times 50$
$N_{μ}^{v v}$	$70$	$50$
$N_{μ}^{c c}$	$220$	$180$
$N_{μ}^{v c}$	$100$	$60$
$N_{iter}$	$150$	$100$

Table 2. Table 2: Errors in the spectrum for differently accurate ISDF approximations.

Error in
$Z$	Absorption Function	First Eigenvalue
$0.5$	$0.199$	$0.0038 (20.7 meV)$
$0.1$	$0.056$	$0.0011 (6.2 meV)$
$0.05$	$0.040$	$0.0006 (3.3 meV)$

Equations116

L = {R ∣ R = n_{1} a_{1} + n_{2} a_{2} + n_{3} a_{3}, n_{1}, n_{2}, n_{3} \in Z} .

L = {R ∣ R = n_{1} a_{1} + n_{2} a_{2} + n_{3} a_{3}, n_{1}, n_{2}, n_{3} \in Z} .

V_{eff} (r + R) = V_{eff} (r), \forall r \in R^{3}, R \in L .

V_{eff} (r + R) = V_{eff} (r), \forall r \in R^{3}, R \in L .

Ω = {r = c_{1} a_{1} + c_{2} a_{2} + c_{3} a_{3} ∣ 0 \leq c_{1}, c_{2}, c_{3} < 1} .

Ω = {r = c_{1} a_{1} + c_{2} a_{2} + c_{3} a_{3} ∣ 0 \leq c_{1}, c_{2}, c_{3} < 1} .

\Omega^{*}=\left\{\mathbf{k}=k_{1}\mathbf{b}_{1}+k_{2}\mathbf{b}_{2}+k_{3}\mathbf{b}_{3}~{}\Big{|}~{}-\frac{1}{2}\leq k_{1},k_{2},k_{3}<\frac{1}{2}\right\}.

\Omega^{*}=\left\{\mathbf{k}=k_{1}\mathbf{b}_{1}+k_{2}\mathbf{b}_{2}+k_{3}\mathbf{b}_{3}~{}\Big{|}~{}-\frac{1}{2}\leq k_{1},k_{2},k_{3}<\frac{1}{2}\right\}.

ψ_{i k} (r) = e^{i k \cdot r} u_{i k} (r),

ψ_{i k} (r) = e^{i k \cdot r} u_{i k} (r),

u_{i k} (r + R) = u_{i k} (r), \forall R \in L .

u_{i k} (r + R) = u_{i k} (r), \forall R \in L .

H (k) u_{i k} = ϵ_{i k} u_{i k} (r), r \in Ω, k \in Ω^{*},

H (k) u_{i k} = ϵ_{i k} u_{i k} (r), r \in Ω, k \in Ω^{*},

in f ∣ ϵ_{i k} - ϵ_{i^{'} k^{'}} ∣ := ϵ_{g} > 0, k, k^{'} \in Ω^{*}, 1 \leq i \leq N_{v}, N_{v} + 1 \leq i^{'} \leq N .

in f ∣ ϵ_{i k} - ϵ_{i^{'} k^{'}} ∣ := ϵ_{g} > 0, k, k^{'} \in Ω^{*}, 1 \leq i \leq N_{v}, N_{v} + 1 \leq i^{'} \leq N .

∣ Ω^{*} ∣ = \frac{( 2 π ) ^{3}}{∣ Ω ∣}

∣ Ω^{*} ∣ = \frac{( 2 π ) ^{3}}{∣ Ω ∣}

\int_{R^{3}} ψ_{i^{'} k^{'}}^{*} (r) ψ_{i, k} (r) d r = ∣ Ω^{*} ∣ δ_{i^{'}, i} δ (k^{'} - k) .

\int_{R^{3}} ψ_{i^{'} k^{'}}^{*} (r) ψ_{i, k} (r) d r = ∣ Ω^{*} ∣ δ_{i^{'}, i} δ (k^{'} - k) .

\frac{1}{∣ Ω ^{*} ∣} \int_{Ω^{*}} \int_{R^{3}} ψ_{i^{'} k}^{*} (r) ψ_{i k} (r) d r d k = δ_{i^{'}, i} .

\frac{1}{∣ Ω ^{*} ∣} \int_{Ω^{*}} \int_{R^{3}} ψ_{i^{'} k}^{*} (r) ψ_{i k} (r) d r d k = δ_{i^{'}, i} .

ρ (r) = \frac{1}{∣ Ω ^{*} ∣} \int_{Ω^{*}} i = 1 \sum N_{v} ∣ ψ_{i k} (r) ∣^{2} d k = \frac{1}{∣ Ω ^{*} ∣} \int_{Ω^{*}} i = 1 \sum N_{v} ∣ u_{i k} (r) ∣^{2} d k .

ρ (r) = \frac{1}{∣ Ω ^{*} ∣} \int_{Ω^{*}} i = 1 \sum N_{v} ∣ ψ_{i k} (r) ∣^{2} d k = \frac{1}{∣ Ω ^{*} ∣} \int_{Ω^{*}} i = 1 \sum N_{v} ∣ u_{i k} (r) ∣^{2} d k .

\mathcal{K}^{\ell}_{\mathbf{s}}=\left\{\sum_{\alpha=1}^{3}\frac{m_{\alpha}-s_{\alpha}}{N^{\ell}_{\alpha}}\mathbf{b}_{\alpha}\;\Big{|}\;m_{\alpha}=-\frac{N^{\ell}_{\alpha}}{2}+1,\ldots,\frac{N^{\ell}_{\alpha}}{2},\quad 0\leq s_{\alpha}<1,\quad\alpha=1,2,3\right\}.

\mathcal{K}^{\ell}_{\mathbf{s}}=\left\{\sum_{\alpha=1}^{3}\frac{m_{\alpha}-s_{\alpha}}{N^{\ell}_{\alpha}}\mathbf{b}_{\alpha}\;\Big{|}\;m_{\alpha}=-\frac{N^{\ell}_{\alpha}}{2}+1,\ldots,\frac{N^{\ell}_{\alpha}}{2},\quad 0\leq s_{\alpha}<1,\quad\alpha=1,2,3\right\}.

\int_{Ω^{ℓ}} ψ_{i^{'} k^{'}}^{*} (r) ψ_{i k} (r) d r = δ_{i^{'}, i} δ_{k^{'}, k}, k, k^{'} \in K^{ℓ} .

\int_{Ω^{ℓ}} ψ_{i^{'} k^{'}}^{*} (r) ψ_{i k} (r) d r = δ_{i^{'}, i} δ_{k^{'}, k}, k, k^{'} \in K^{ℓ} .

ψ_{i k} (r) = \frac{1}{N ^{ℓ}} e^{i k \cdot r} u_{i k} (r), k \in K^{ℓ} .

ψ_{i k} (r) = \frac{1}{N ^{ℓ}} e^{i k \cdot r} u_{i k} (r), k \in K^{ℓ} .

\int_{Ω} u_{i^{'} k}^{*} (r) u_{i k} (r) d r = δ_{i^{'}, i}, k \in K^{ℓ} .

\int_{Ω} u_{i^{'} k}^{*} (r) u_{i k} (r) d r = δ_{i^{'}, i}, k \in K^{ℓ} .

H_{BSE} X = E X,

H_{BSE} X = E X,

H_{BSE} = [D + 2 V_{A} - W_{A} - 2 \overline{V}_{B} + \overline{W}_{B} 2 V_{B} - W_{B} - D - 2 \overline{V}_{A} + \overline{W}_{A}],

H_{BSE} = [D + 2 V_{A} - W_{A} - 2 \overline{V}_{B} + \overline{W}_{B} 2 V_{B} - W_{B} - D - 2 \overline{V}_{A} + \overline{W}_{A}],

V_{A} (i_{v} i_{c} k, j_{v} j_{c} k^{'}) V_{B} (i_{v} i_{c} k, j_{v} j_{c} k^{'}) W_{A} (i_{v} i_{c} k, j_{v} j_{c} k^{'}) W_{B} (i_{v} i_{c} k, j_{v} j_{c} k^{'}) = \int_{Ω^{ℓ} \times Ω^{ℓ}} \overset{ˉ}{ψ}_{i_{c} k} (r) ψ_{i_{v} k} (r) V (r, r^{'}) \overset{ˉ}{ψ}_{j_{v} k^{'}} (r^{'}) ψ_{j_{c} k^{'}} (r^{'}) d r d r^{'}, = \int_{Ω^{ℓ} \times Ω^{ℓ}} \overset{ˉ}{ψ}_{i_{c} k} (r) ψ_{i_{v} k} (r) V (r, r^{'}) \overset{ˉ}{ψ}_{j_{c} k^{'}} (r^{'}) ψ_{j_{v} k^{'}} (r^{'}) d r d r^{'}, = \int_{Ω^{ℓ} \times Ω^{ℓ}} \overset{ˉ}{ψ}_{i_{c} k} (r) ψ_{j_{c} k^{'}} (r) W (r, r^{'}) \overset{ˉ}{ψ}_{j_{v} k^{'}} (r^{'}) ψ_{i_{v} k} (r^{'}) d r d r^{'}, = \int_{Ω^{ℓ} \times Ω^{ℓ}} \overset{ˉ}{ψ}_{i_{c} k} (r) ψ_{j_{v} k^{'}} (r) W (r, r^{'}) \overset{ˉ}{ψ}_{j_{c} k^{'}} (r^{'}) ψ_{i_{v} k} (r^{'}) d r d r^{'} .

V_{A} (i_{v} i_{c} k, j_{v} j_{c} k^{'}) V_{B} (i_{v} i_{c} k, j_{v} j_{c} k^{'}) W_{A} (i_{v} i_{c} k, j_{v} j_{c} k^{'}) W_{B} (i_{v} i_{c} k, j_{v} j_{c} k^{'}) = \int_{Ω^{ℓ} \times Ω^{ℓ}} \overset{ˉ}{ψ}_{i_{c} k} (r) ψ_{i_{v} k} (r) V (r, r^{'}) \overset{ˉ}{ψ}_{j_{v} k^{'}} (r^{'}) ψ_{j_{c} k^{'}} (r^{'}) d r d r^{'}, = \int_{Ω^{ℓ} \times Ω^{ℓ}} \overset{ˉ}{ψ}_{i_{c} k} (r) ψ_{i_{v} k} (r) V (r, r^{'}) \overset{ˉ}{ψ}_{j_{c} k^{'}} (r^{'}) ψ_{j_{v} k^{'}} (r^{'}) d r d r^{'}, = \int_{Ω^{ℓ} \times Ω^{ℓ}} \overset{ˉ}{ψ}_{i_{c} k} (r) ψ_{j_{c} k^{'}} (r) W (r, r^{'}) \overset{ˉ}{ψ}_{j_{v} k^{'}} (r^{'}) ψ_{i_{v} k} (r^{'}) d r d r^{'}, = \int_{Ω^{ℓ} \times Ω^{ℓ}} \overset{ˉ}{ψ}_{i_{c} k} (r) ψ_{j_{v} k^{'}} (r) W (r, r^{'}) \overset{ˉ}{ψ}_{j_{c} k^{'}} (r^{'}) ψ_{i_{v} k} (r^{'}) d r d r^{'} .

V_{A} (i_{v} i_{c} k, j_{v} j_{c} k^{'}) V_{B} (i_{v} i_{c} k, j_{v} j_{c} k^{'}) W_{A} (i_{v} i_{c} k, j_{v} j_{c} k^{'}) W_{B} (i_{v} i_{c} k, j_{v} j_{c} k^{'}) = \frac{1}{N _{k}^{2}} \int_{Ω^{ℓ} \times Ω^{ℓ}} \overset{u}{ˉ}_{i_{c} k} (r) u_{i_{v} k} (r) V (r, r^{'}) \overset{u}{ˉ}_{j_{v} k^{'}} (r^{'}) u_{j_{c} k^{'}} (r^{'}) d r d r^{'}, = \frac{1}{N _{k}^{2}} \int_{Ω^{ℓ} \times Ω^{ℓ}} \overset{u}{ˉ}_{i_{c} k} (r) u_{i_{v} k} (r) V (r, r^{'}) \overset{u}{ˉ}_{j_{c} k^{'}} (r^{'}) u_{j_{v} k^{'}} (r^{'}) d r d r^{'}, = \frac{1}{N _{k}^{2}} \int_{Ω^{ℓ} \times Ω^{ℓ}} e^{- i (k - k^{'}) \cdot (r - r^{'})} \overset{u}{ˉ}_{i_{c} k} (r) u_{j_{c} k^{'}} (r) W (r, r^{'}) \overset{u}{ˉ}_{j_{v} k^{'}} (r^{'}) u_{i_{v} k} (r^{'}) d r d r^{'}, = \frac{1}{N _{k}^{2}} \int_{Ω^{ℓ} \times Ω^{ℓ}} e^{- i (k - k^{'}) \cdot (r - r^{'})} \overset{u}{ˉ}_{i_{c} k} (r) u_{j_{v} k^{'}} (r) W (r, r^{'}) \overset{u}{ˉ}_{j_{c} k^{'}} (r^{'}) u_{i_{v} k} (r^{'}) d r d r^{'} .

V_{A} (i_{v} i_{c} k, j_{v} j_{c} k^{'}) V_{B} (i_{v} i_{c} k, j_{v} j_{c} k^{'}) W_{A} (i_{v} i_{c} k, j_{v} j_{c} k^{'}) W_{B} (i_{v} i_{c} k, j_{v} j_{c} k^{'}) = \frac{1}{N _{k}^{2}} \int_{Ω^{ℓ} \times Ω^{ℓ}} \overset{u}{ˉ}_{i_{c} k} (r) u_{i_{v} k} (r) V (r, r^{'}) \overset{u}{ˉ}_{j_{v} k^{'}} (r^{'}) u_{j_{c} k^{'}} (r^{'}) d r d r^{'}, = \frac{1}{N _{k}^{2}} \int_{Ω^{ℓ} \times Ω^{ℓ}} \overset{u}{ˉ}_{i_{c} k} (r) u_{i_{v} k} (r) V (r, r^{'}) \overset{u}{ˉ}_{j_{c} k^{'}} (r^{'}) u_{j_{v} k^{'}} (r^{'}) d r d r^{'}, = \frac{1}{N _{k}^{2}} \int_{Ω^{ℓ} \times Ω^{ℓ}} e^{- i (k - k^{'}) \cdot (r - r^{'})} \overset{u}{ˉ}_{i_{c} k} (r) u_{j_{c} k^{'}} (r) W (r, r^{'}) \overset{u}{ˉ}_{j_{v} k^{'}} (r^{'}) u_{i_{v} k} (r^{'}) d r d r^{'}, = \frac{1}{N _{k}^{2}} \int_{Ω^{ℓ} \times Ω^{ℓ}} e^{- i (k - k^{'}) \cdot (r - r^{'})} \overset{u}{ˉ}_{i_{c} k} (r) u_{j_{v} k^{'}} (r) W (r, r^{'}) \overset{u}{ˉ}_{j_{c} k^{'}} (r^{'}) u_{i_{v} k} (r^{'}) d r d r^{'} .

V (f, g) := \frac{1}{N _{k}} \int_{Ω^{ℓ} \times Ω^{ℓ}} \overset{ˉ}{f} (r) V (r, r^{'}) g (r^{'}) d r d r^{'},

V (f, g) := \frac{1}{N _{k}} \int_{Ω^{ℓ} \times Ω^{ℓ}} \overset{ˉ}{f} (r) V (r, r^{'}) g (r^{'}) d r d r^{'},

W_{q} (f, g) := \frac{1}{N _{k}} \int_{Ω^{ℓ} \times Ω^{ℓ}} e^{- i q \cdot (r - r^{'})} \overset{ˉ}{f} (r) W (r, r^{'}) g (r^{'}) d r d r^{'} .

W_{q} (f, g) := \frac{1}{N _{k}} \int_{Ω^{ℓ} \times Ω^{ℓ}} e^{- i q \cdot (r - r^{'})} \overset{ˉ}{f} (r) W (r, r^{'}) g (r^{'}) d r d r^{'} .

V_{A} (i_{v} i_{c} k, j_{v} j_{c} k^{'}) = V_{B} (i_{v} i_{c} k, j_{v} j_{c} k^{'}) = W_{A} (i_{v} i_{c} k, j_{v} j_{c} k^{'}) = W_{B} (i_{v} i_{c} k, j_{v} j_{c} k^{'}) = \frac{1}{N _{k}} V (\overset{u}{ˉ}_{i_{v} k} u_{i_{c} k}, \overset{u}{ˉ}_{j_{v} k^{'}} u_{j_{c} k^{'}}), \frac{1}{N _{k}} V (\overset{u}{ˉ}_{i_{v} k} u_{i_{c} k}, \overset{u}{ˉ}_{j_{c} k^{'}} u_{j_{v} k^{'}}), \frac{1}{N _{k}} W_{k - k^{'}} (\overset{u}{ˉ}_{j_{c} k^{'}} u_{i_{c} k}, \overset{u}{ˉ}_{j_{v} k^{'}} u_{i_{v} k}), \frac{1}{N _{k}} W_{k - k^{'}} (\overset{u}{ˉ}_{j_{v} k^{'}} u_{i_{c} k}, \overset{u}{ˉ}_{j_{c} k^{'}} u_{i_{v} k}) .

V_{A} (i_{v} i_{c} k, j_{v} j_{c} k^{'}) = V_{B} (i_{v} i_{c} k, j_{v} j_{c} k^{'}) = W_{A} (i_{v} i_{c} k, j_{v} j_{c} k^{'}) = W_{B} (i_{v} i_{c} k, j_{v} j_{c} k^{'}) = \frac{1}{N _{k}} V (\overset{u}{ˉ}_{i_{v} k} u_{i_{c} k}, \overset{u}{ˉ}_{j_{v} k^{'}} u_{j_{c} k^{'}}), \frac{1}{N _{k}} V (\overset{u}{ˉ}_{i_{v} k} u_{i_{c} k}, \overset{u}{ˉ}_{j_{c} k^{'}} u_{j_{v} k^{'}}), \frac{1}{N _{k}} W_{k - k^{'}} (\overset{u}{ˉ}_{j_{c} k^{'}} u_{i_{c} k}, \overset{u}{ˉ}_{j_{v} k^{'}} u_{i_{v} k}), \frac{1}{N _{k}} W_{k - k^{'}} (\overset{u}{ˉ}_{j_{v} k^{'}} u_{i_{c} k}, \overset{u}{ˉ}_{j_{c} k^{'}} u_{i_{v} k}) .

f (r) = G \in L^{*} \sum \hat{f} (G) e^{i G \cdot r},

f (r) = G \in L^{*} \sum \hat{f} (G) e^{i G \cdot r},

\hat{f} (G) = \frac{1}{∣ Ω ∣} \int_{Ω} e^{- i G \cdot r} f (r) d r .

\hat{f} (G) = \frac{1}{∣ Ω ∣} \int_{Ω} e^{- i G \cdot r} f (r) d r .

\int_{Ω} \overset{ˉ}{f} (r) g (r) d r = ∣ Ω ∣ G \in L^{*} \sum \overset{ˉ}{\hat{f}} (G) \overset{g}{^} (G) .

\int_{Ω} \overset{ˉ}{f} (r) g (r) d r = ∣ Ω ∣ G \in L^{*} \sum \overset{ˉ}{\hat{f}} (G) \overset{g}{^} (G) .

V (r + R, r^{'} + R) = V (r, r^{'}), W (r + R, r^{'} + R) = W (r, r^{'}), \forall R \in L .

V (r + R, r^{'} + R) = V (r, r^{'}), W (r + R, r^{'} + R) = W (r, r^{'}), \forall R \in L .

V (r, r^{'}) = \frac{1}{∣ Ω ^{ℓ} ∣} k \in K^{ℓ} \sum G, G^{'} \sum e^{i (k + G) \cdot r} \hat{V}_{k} (G, G^{'}) e^{- i (k + G^{'}) \cdot r^{'}},

V (r, r^{'}) = \frac{1}{∣ Ω ^{ℓ} ∣} k \in K^{ℓ} \sum G, G^{'} \sum e^{i (k + G) \cdot r} \hat{V}_{k} (G, G^{'}) e^{- i (k + G^{'}) \cdot r^{'}},

\hat{V}_{k} (G, G^{'}) = \frac{1}{∣ Ω ^{ℓ} ∣} \int_{Ω^{ℓ} \times Ω^{ℓ}} d r d r^{'} e^{- i (k + G) \cdot r} V (r, r^{'}) e^{i (k + G^{'}) \cdot r^{'}}

\hat{V}_{k} (G, G^{'}) = \frac{1}{∣ Ω ^{ℓ} ∣} \int_{Ω^{ℓ} \times Ω^{ℓ}} d r d r^{'} e^{- i (k + G) \cdot r} V (r, r^{'}) e^{i (k + G^{'}) \cdot r^{'}}

V (r + r^{''}, r^{'} + r^{''}) = V (r, r^{'}), \forall r^{''} \in Ω^{ℓ} .

V (r + r^{''}, r^{'} + r^{''}) = V (r, r^{'}), \forall r^{''} \in Ω^{ℓ} .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

fhenneke/BSE_k_ISDF.jl
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Fast optical absorption spectra calculations

for periodic solid state systems

Felix Henneke Institut für Mathematik, Freie Universität Berlin, Germany, Email: [email protected]

Lin Lin Department of Mathematics, University of California, Berkeley, and Computational Research Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720. Email: [email protected]

Christian Vorwerk Institut für Physik and IRIS Adlershof, Humboldt-Universität zu Berlin, Germany, Email: [email protected]

Claudia Draxl Institut für Physik and IRIS Adlershof, Humboldt-Universität zu Berlin, Germany, Germany, Email: [email protected]

Rupert Klein Institut für Mathematik, Freie Universität Berlin, Germany, Email: [email protected]

Chao Yang Computational Research Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720. Email: [email protected]

Abstract

We present a method to construct an efficient approximation to the bare exchange and screened direct interaction kernels of the Bethe-Salpeter Hamiltonian for periodic solid state systems via the interpolative separable density fitting technique. We show that the cost of constructing the approximate Bethe-Salpeter Hamiltonian scales nearly optimally as $\mathcal{O}(N_{k})$ with respect to the number of samples in the Brillouin zone $N_{k}$ . In addition, we show that the cost for applying the Bethe-Salpeter Hamiltonian to a vector scales as $\mathcal{O}(N_{k}\log N_{k})$ . Therefore the optical absorption spectrum, as well as selected excitation energies can be efficiently computed via iterative methods such as the Lanczos method. This is a significant reduction from the $\mathcal{O}(N_{k}^{2})$ and $\mathcal{O}(N_{k}^{3})$ scaling associated with a brute force approach for constructing the Hamiltonian and diagonalizing the Hamiltonian respectively. We demonstrate the efficiency and accuracy of this approach with both one-dimensional model problems and three-dimensional real materials (graphene and diamond). For the diamond system with $N_{k}=2197$ , it takes $6$ hours to assemble the Bethe-Salpeter Hamiltonian and $4$ hours to fully diagonalize the Hamiltonian using $169$ cores when the brute force approach is used. The new method takes less than $3$ minutes to set up the Hamiltonian and $24$ minutes to compute the absorption spectrum on a single core.

keywords:

Bethe–Salpeter equation, interpolative separable density fitting, optical absorption function

AMS:

65F15, 65Z05

1 Introduction

The Bethe–Salpeter equation (BSE), derived from the many-body perturbation theory (MBPT), is a widely used method for describing the optical absorption process in molecules and solids [31, 32, 35, 23, 1, 24, 6]. It models the behavior of an electron–hole pair, which is an excitation process with two quasi-particles. Solving BSE requires constructing and diagonalizing a structured matrix, called the Bethe–Salpeter Hamiltonian (BSH). In the context of optical absorption, the eigenvalues of the BSH are the exciton energies and the corresponding eigenfunctions yield the exciton wavefunctions. The BSH consists of the so called bare exchange and screened direct interaction kernels that depend on single-particle orbitals obtained from a quasi-particle (usually at the GW level) or mean-field calculation. For isolated systems such as molecules, the construction of these kernels requires at least $\mathcal{O}(N_{e}^{5})$ operations in a conventional approach, where $N_{e}$ is the number of electrons in the system. This is very costly for large systems that contain hundreds or more atoms. Recent efforts have actively explored methods for efficient representation of the BSH, in order to reduce the high computational cost of BSE calculations [3, 13, 16, 21, 29, 26, 27, 30].

In a recent work [12], two of the authors have presented an efficient way to construct the BSH for molecular systems, and to efficiently solve the BSE eigenvalue problem using an iterative scheme. Our approach is based on the recently-developed interpolative separable density fitting (ISDF) decomposition [19, 20]. The ISDF decomposition has been applied to accelerate a number of applications in computational chemistry and materials science, including the computation of two-electrons integrals [19], correlation energy in the random phase approximation [18], density functional perturbation theory [15], and hybrid density functional calculations [11, 7]. In this scheme, a matrix consisting of products of single-particle orbital pairs is efficiently approximated as a low-rank matrix product, between a matrix built with a small number of auxiliary basis vectors and an expansion coefficient matrix. This decomposition allows us to construct efficient representations to the bare exchange and screened direct kernels. For isolated systems, the construction of the ISDF-compressed BSH matrix only requires $\mathcal{O}(N_{e}^{3})$ operations when the rank of the numerical auxiliary basis is kept at $\mathcal{O}(N_{e})$ . This results in considerate reduction of the cost compared to the $\mathcal{O}(N_{e}^{5})$ complexity required in a conventional approach. By keeping the interaction kernels in a decomposed form, the matrix–vector multiplications required in the iterative diagonalization procedures of the Hamiltonian $H_{\text{BSE}}$ can be performed efficiently. We can further use these efficient matrix–vector multiplications in a structure preserving Lanczos algorithm [33] to obtain an approximate absorption spectrum without an explicit diagonalization of the approximate $H_{\text{BSE}}$ .

This paper generalizes the work in [12] to periodic solid state systems. According to the Bloch decomposition, each single particle orbital in a periodic system can be characterized by an orbital index $i$ , and a Brillouin zone index $\mathbf{k}$ . Compared to isolated systems, the total number of electrons $N_{e}$ is equal to the number of electrons per unit cell multiplied by the number of $\mathbf{k}$ points denoted by $N_{k}$ . It has been observed that for many extended systems, the number of orbitals (both occupied and virtual orbitals) required for one particular $\mathbf{k}$ index can be relatively small, and is independent of $N_{e}$ . Hence the difficulty of optical absorption spectra calculations for periodic systems mainly arise from the large number of $\mathbf{k}$ -points. This is particularly the case when the excitons are delocalized in the real space, or when the Fermi-surface is not smooth (such as graphene, and other metallic systems). In such case, $N_{k}$ can often be rather large (from hundreds to hundreds of thousands, see e.g. [28], where a $120\times 120\times 1$ $\mathbf{k}$ -grid is used for the quasi two-dimensional MoS2 system) in order to properly discretize and sample the Brillouin zone. The cost for constructing the bare exchange and screened direct kernels scales as $\mathcal{O}(N_{k}^{2})$ , while the cost for diagonalizing the corresponding BSH scales as $\mathcal{O}(N_{k}^{3})$ . This is prohibitively expensive when a dense discretization of the Brillouin zone is needed.

With the help of ISDF, we can find a reduced representation of the pair product orbitals in the periodic setting [20]. Such a reduced representation is possible, thanks to the smoothness of the single particle orbitals with respect to the $\mathbf{k}$ index, and that the Brillouin zone is a compact domain. We will show that we can reduce the complexity of the bare exchange and screened direct kernel construction for extended systems to the optimal complexity of $\mathcal{O}(N_{k})$ . Instead of diagonalizing the BSH directly, we use iterative methods such as the Lanczos method to evaluate the optical absorption spectrum. The complexity of applying the approximated kernels to a vector with respect to $N_{k}$ is only $\mathcal{O}(N_{k}\log N_{k})$ . The same strategy can be applied to evaluate selected excitation energies.

The rest of the paper is organized as follows. We first provide a concise review of the single particle theory and the Bethe-Salpeter equation for periodic systems in section 2. We could not find a precise mathematical description of how the BSH is constructed for periodic systems with a discretized Brillouin zone in the literature. We therefore provide a self-contained derivation in section 2.2. The interpolative separable density fitting for periodic systems is introduced in section 3, and the application of the approximate BSH in the ISDF format to a vector in section 4. The numerical results are presented in section 5, followed by a conclusion in section 6.

2 Preliminaries

2.1 Single particle theory for periodic systems

To facilitate further discussion we briefly review Bloch-Floquet theory for periodic systems. Without loss of generality we consider a three-dimensional crystal. The Bravais lattice with lattice vectors $\mathbf{a}_{1},\mathbf{a}_{2},\mathbf{a}_{3}\in\mathbb{R}^{3}$ is defined as

[TABLE]

In single particle theories such as the Kohn-Sham density functional theory, the self-consistent effective potential $V_{\text{eff}}$ is real-valued and $\mathbb{L}$ -periodic, i.e.

[TABLE]

The unit cell is defined as

[TABLE]

The Bravais lattice induces a reciprocal lattice $\mathbb{L}^{*}$ , with its lattice vectors $\mathbf{b}_{1},\mathbf{b}_{2},\mathbf{b}_{3}$ satisfying $\mathbf{a}_{\alpha}\cdot\mathbf{b}_{\beta}=2\pi\delta_{\alpha\beta},\alpha,\beta\in\{1,2,3\}$ . The unit cell of the reciprocal lattice is called the (first) Brillouin zone and denoted by $\Omega^{*}$ , defined as

[TABLE]

The Brillouin zone has a number of special points related to the symmetry of the crystal. The common special point is the $\Gamma$ -point, which corresponds to $\mathbf{k}=(0,0,0)^{\top}$ .

According to the Bloch-Floquet theory, the spectrum of the Hamiltonian $\mathcal{H}=-\frac{1}{2}\nabla_{\mathbf{r}}^{2}+V_{\text{eff}}(\mathbf{r})$ can be relabeled using two indices $(i,\mathbf{k})$ , where $i\in\mathbb{N}$ is called the band index and $\mathbf{k}\in\Omega^{*}$ is the Brillouin zone index. Each generalized eigenfunction $\psi_{i\mathbf{k}}(\mathbf{r})$ is known as a Bloch orbital and satisfies $\mathcal{H}\psi_{i\mathbf{k}}(\mathbf{r})=\epsilon_{i\mathbf{k}}\psi_{i\mathbf{k}}(\mathbf{r})$ with Bloch boundary conditions $\psi_{i\mathbf{k}}(\mathbf{r}+\mathbf{R})=e^{\mathrm{i}\mathbf{k}\cdot\mathbf{R}}\psi_{i\mathbf{k}}(\mathbf{r})$ for any $\mathbf{R}\in\mathbb{L}$ . Furthermore, $\psi_{i\mathbf{k}}$ can be decomposed using the Bloch decomposition

[TABLE]

where $u_{i\mathbf{k}}(\mathbf{r})$ is the periodic part of $\psi_{i\mathbf{k}}(\mathbf{r})$ satisfying the periodic boundary condition on the unit cell

[TABLE]

It can be directly obtained by solving the eigenvalue problem

[TABLE]

where $\mathcal{H}(\mathbf{k})=-\frac{1}{2}(\nabla_{\mathbf{r}}+\mathrm{i}\mathbf{k})^{2}+V_{\text{eff}}(\mathbf{r})$ . For each $\mathbf{k}\in\Omega^{*}$ , the eigenvalues $\epsilon_{i\mathbf{k}}$ are ordered non-decreasingly. For a fixed $i$ , $\{\epsilon_{i\mathbf{k}}\}$ as a function of $\mathbf{k}$ is called a Bloch band. The collection of all eigenvalues forms the band structure of the crystal, which characterizes the spectrum of the operator $\mathcal{H}$ .

In the discussion below, we denote by $N_{v}$ the number of valence bands (i.e., occupied orbitals per unit cell in the ground state), $N_{c}$ the number of conduction bands (i.e. unoccupied orbitals per unit cell in the ground state). We also define $N=N_{v}+N_{c}$ . We assume the systems to be insulating, in the sense that the following band isolation conditions between the valence and conduction bands are satisfied:

[TABLE]

Denote by $\lvert\Omega\rvert$ the volume of the unit cell, and

[TABLE]

the volume of the Brillouin zone. The Bloch orbitals $\{\psi_{i\mathbf{k}}\}$ satisfy the orthonormality condition in the distributional sense

[TABLE]

Here $\delta_{i^{\prime},i}$ is the Kronecker $\delta$ symbol for a discrete set, while $\delta(\mathbf{k}^{\prime}-\mathbf{k})$ is the Dirac-delta distribution. Equation (2.7) implies the normalization condition when integrated over the Brillouin zone

[TABLE]

From the Bloch orbitals, the ground state electron density can be constructed as

[TABLE]

In order to practically perform calculations for periodic systems, the integration with respect to the Brillouin zone $\Omega^{*}$ needs to be discretized using a quadrature. The most commonly used scheme is based on the Monkhorst-Pack grid [22]

[TABLE]

It is clear that $\mathcal{K}^{\ell}_{\mathbf{s}}\subset\Omega^{*}$ and that it corresponds to a uniform discretization of the Brillouin zone. When the shift vector $\mathbf{s}=\mathbf{0}$ , we denote by $\mathcal{K}^{\ell}:=\mathcal{K}^{\ell}_{\mathbf{0}}$ , and the calculation of periodic systems can be equivalently performed using a supercell consisting of $N^{\ell}_{1}\times N^{\ell}_{2}\times N^{\ell}_{3}$ unit cells. The supercell is denoted by $\Omega^{\ell}$ , and is further equipped with periodic boundary condition called the Born-von Karman boundary condition [2]. The calculation of a periodic crystal can thus be recovered by taking the limit $N^{\ell}_{\alpha}\to\infty$ . We denote by $N_{k}\equiv N^{\ell}:=N^{\ell}_{1}N^{\ell}_{2}N^{\ell}_{3}$ the total number of unit cells, or equivalently the total number of Monkhorst-Pack grid points in the Brillouin zone.

Assuming the Brillouin zone is discretized using $\mathcal{K}^{\ell}$ , the orthogonality condition (2.7) becomes

[TABLE]

We also modify the Bloch decomposition as

[TABLE]

Here the normalization factor $1/\sqrt{N^{\ell}}$ is introduced so that the orthogonality condition for the periodic part implies

[TABLE]

To facilitate the book-keeping effort of various relevant constants in practical calculations, in the discussion below we will always assume that the Brillouin zone is discretized into $\mathcal{K}^{\ell}$ with a corresponding supercell $\Omega^{\ell}$ . The volume of the supercell is $\lvert\Omega^{\ell}\rvert=N^{\ell}\lvert\Omega\rvert=N_{k}\lvert\Omega\rvert$ . The unit cell is further discretized into a uniform grid $\{\mathbf{r}_{i}\}_{i=1}^{N_{g}}$ . Practical BSE calculations often truncate the number of conduction bands aggressively, in the sense that $N_{g}\gg N_{v}+N_{c}=:N$ . Numerical results indicate that in many cases, the low-lying excitation spectrum is relatively insensitive to $N_{c}$ , and one can often choose $N_{c}\approx N_{v}$ . Unless otherwise clarified, we may not distinguish a continuous vector $u(\mathbf{r})$ and the corresponding discretized vector $\{u(\mathbf{r}_{i})\}$ . Similarly, when the context is clear, we do not distinguish the kernel of an operator $A(\mathbf{r},\mathbf{r}^{\prime})$ and its discretized matrix $\{A(\mathbf{r}_{i},\mathbf{r}_{j})\}$ .

2.2 Bethe-Salpeter equation for periodic systems

The Bethe–Salpeter equation is an eigenvalue problem of the form

[TABLE]

where $H_{\text{BSE}}$ is the Bethe–Salpeter Hamiltonian (BSH), $X$ is the exciton wavefunction, and $E$ is the corresponding exciton energy. For periodic systems, the BSH has the following block structure

[TABLE]

where $D(i_{v}i_{c}\mathbf{k},j_{v}j_{c}\mathbf{k}^{\prime})=(\epsilon_{i_{c}\mathbf{k}}-\epsilon_{i_{v}\mathbf{k}})\delta_{i_{v},j_{v}}\delta_{i_{c},j_{c}}\delta_{\mathbf{k},\mathbf{k}^{\prime}}$ is an $(N_{v}N_{c}N_{k})\times(N_{v}N_{c}N_{k})$ diagonal matrix. The quasi-particle energies $\epsilon_{i_{v}\mathbf{k}},\epsilon_{i_{c}\mathbf{k}}$ are typically obtained from a GW calculation [31]. The $V_{A}$ and $V_{B}$ matrices represent the bare exchange interaction of electron–hole pairs, and the $W_{A}$ and $W_{B}$ matrices are referred to as the screened direct interaction of electron–hole pairs. These matrices are defined as follows:

[TABLE]

Here $\psi_{i_{v}\mathbf{k}}$ and $\psi_{i_{c}\mathbf{k}}$ are the valence and conduction single-particle orbitals typically obtained from a Kohn–Sham density functional theory (KSDFT) calculation respectively, and $V(\mathbf{r},\mathbf{r^{\prime}})$ and $W(\mathbf{r},\mathbf{r^{\prime}})$ are the bare and screened Coulomb interactions. Both $V_{A}$ and $W_{A}$ are Hermitian, whereas $V_{B}$ and $W_{B}$ are complex symmetric. Within the so-called Tamm–Dancoff approximation (TDA) [24], both $V_{B}$ and $W_{B}$ are neglected in Equation (2.15). In this case, the $H_{\text{BSE}}$ becomes Hermitian and we can focus on computing the upper left block of $H_{\text{BSE}}$ .

In the following discussion, when a single index $i$ is used, it refers to either $i_{v}$ or $i_{c}$ . Using the Bloch decomposition (2.12), the matrix elements of the BSH can be written using the periodic part of the orbitals as

[TABLE]

Note that $V_{A},V_{B}$ in Eq. (2.17) do not involve the phase factors, since the factor $e^{\mathrm{i}\mathbf{k}\cdot\mathbf{r}}$ exactly cancels due to the complex conjugate operation. The phase factor only appears in the $W_{A},W_{B}$ terms.

Eq. (2.17) requires the evaluation of integrals of the following form

[TABLE]

and

[TABLE]

Using such notation,

[TABLE]

In Eq. (2.18), (2.19), $f,g$ are periodic functions in the unit cell, and can be represented using their Fourier representations. For instance,

[TABLE]

and its Fourier coefficients can be computed as

[TABLE]

Hence Parseval’s identity reads

[TABLE]

Both of the kernels $V,W$ satisfy the translation symmetry

[TABLE]

Eq. (2.24) also defines the values of $V,W$ for $\mathbf{r},\mathbf{r}^{\prime}$ beyond the supercell $\Omega^{\ell}$ . The Fourier representation of $V$ takes the form

[TABLE]

and the Fourier coefficients can be computed as

[TABLE]

Similarly, the Fourier representation for $W$ can be defined.

It should be noted that the Coulomb kernel $V$ only depends on the distance between $\mathbf{r}$ and $\mathbf{r}^{\prime}$ , i.e. it has further translational symmetry property that

[TABLE]

As a result, its Fourier transform $\hat{V}_{\mathbf{k}}(\mathbf{G},\mathbf{G}^{\prime})$ can be simplified into a diagonal matrix

[TABLE]

In fact, the Coulomb kernel periodized with respect to the supercell $\Omega^{\ell}$ is defined to be the inverse Fourier transform of Eq. (2.28).

Using such notation, we have

[TABLE]

Here we have used $e^{-\mathrm{i}\mathbf{G}^{\prime}\cdot\mathbf{R}}=1$ , the fact that $g$ is periodic with respect to the unit cell $\Omega$ , as well as the identity

[TABLE]

Furthermore, from Eq. (2.22) and the identity

[TABLE]

we have

[TABLE]

Compared to Eq. (2.28), the definition of $\hat{V}_{\mathbf{0}}$ should be modified to

[TABLE]

Another way to understand Eq. (2.32) is that it can only be applied to a mean zero function $g(\mathbf{r})$ , such that $\hat{g}(\mathbf{0})=0$ . In other words, $g$ should be in the range of the Laplacian operator with the periodic boundary condition. This is indeed correct for BSE calculations, due to the orthogonality condition between the valence and conduction bands

[TABLE]

This implies

[TABLE]

Similarly for the $W$ part,

[TABLE]

In order to obtain a non-vanishing quantity in the equation above, note that the quantity $\sum_{\mathbf{R}\in\mathbb{L}}e^{-\mathrm{i}(\mathbf{k}-\mathbf{q})\cdot\mathbf{R}}=N_{k}$ if $\mathbf{k}-\mathbf{q}\in\mathbb{L}^{*}$ , and is otherwise [math]. Therefore the summation with respect to $\mathbf{k}$ should be restricted to those satisfying

[TABLE]

Since $\mathbf{k}$ is restricted to the first Brillouin zone, there is a unique $\mathbf{G}^{\prime\prime}$ (and therefore $\mathbf{k}$ ) for each given $\mathbf{q}$ satisfying this relation. Also note that $\mathbf{k}-\mathbf{q}$ may exceed the first Brillouin zone. In other words, it is indeed possible to have $\mathbf{G}^{\prime\prime}\neq\mathbf{0}$ . Then for a given $\mathbf{q}$ ,

[TABLE]

In the last equality, we have used the definition of the Fourier coefficients in Eq. (2.26). We then readily have

[TABLE]

Therefore, despite that $\mathcal{W}_{\mathbf{q}}(f,g)$ is significantly more complex to define, the resulting formula in the Fourier representation is remarkably similar to the form of $\mathcal{V}(f,g)$ .

3 Interpolative separable density fitting for periodic systems

In order to reduce the computational complexity, we seek to minimize the number of integrals in Equation (2.16). We will use the interpolative separable density fitting decomposition (ISDF) [19, 20]. For periodic systems, we first consider the following general form of decomposition

[TABLE]

When the unit cell is discretized into a uniform grid $\{\mathbf{r}_{n}\}_{n=1}^{N_{g}}$ , $Z$ can be viewed as a matrix with its row index being $\mathbf{r}$ , and the column index being a multi-index $(i\mathbf{k},j\mathbf{k}^{\prime})$ . The matrix size is thus $N_{g}\times N^{2}N_{k}^{2}$ (recall that $N=N_{v}+N_{c}$ ). For a given $\mathbf{r}$ , $u_{i\mathbf{k}}(\mathbf{r})\bar{u}_{j\mathbf{k}^{\prime}}(\mathbf{r})$ can be viewed as a row vector of size $N^{2}N_{k}^{2}$ . The ISDF decomposition then states that all such matrix rows can be approximately expanded using a linear combination of matrix rows with respect to a selected set of interpolation points $\{\hat{\mathbf{r}}_{\mu}\}_{\mu=1}^{N_{\mu}}\subset\{\mathbf{r}_{i}\}_{i=1}^{N_{g}}$ . The coefficients of such a linear combination, or interpolating vectors, are denoted by $\{\zeta_{\mu}(\mathbf{r})\}_{\mu=1}^{N_{\mu}}$ . Here $N_{\mu}$ can be interpreted as the numerical rank of the ISDF decomposition.

The compression of the pair products $u_{i\mathbf{k}}(\mathbf{r})\bar{u}_{j\mathbf{k}^{\prime}}(\mathbf{r})$ can be understood from the following two limits. First, if only the $\Gamma$ point is used to sample the Brillouin zone, we find that there are $N_{v}N_{c}\sim N^{2}$ pairs of functions. However, the number of grid points $N_{g}$ only scales linearly with respect to $N$ . Hence the numerical rank of the pair products must scale asymptotically as $\mathcal{O}(N)$ . In fact, when all orbitals are smooth functions, we can expect that the numerical rank $N_{\mu}$ to be much lower than $N_{g}$ . This statement has been confirmed by recent analysis [17]. Second, if a large number of $\mathbf{k}$ -points are used to discretize the Brillouin zone, $N_{v},N_{c}$ are often relatively small, and the number of grid points in the unit cell $N_{g}$ does not increase with respect to $N_{k}$ . Hence as $N_{k}$ increases, we may also expect that the numerical rank $N_{\mu}$ will be determined by smoothness of $u$ with respect to $\mathbf{r},\mathbf{k}$ , and is asymptotically independent of $N_{k}$ . This is indeed what we observe in numerical results. Throughout the discussion below, we will focus on the second scenario, i.e. we will explicitly write down the scaling with respect to $N_{g},N$ and $N_{k}$ , but we will primarily focus on the scaling with respect to $N_{k}$ .

Assume the interpolation points $\{\hat{\mathbf{r}}_{\mu}\}_{\mu=1}^{N_{\mu}}$ are already chosen, the interpolation vectors can be efficiently evaluated using a least squares method as follows [11]. Using a linear algebra notation, Eq. (3.1) can be written as

[TABLE]

Here $\Theta=[\zeta_{1},\zeta_{2},...,\zeta_{N_{\mu}}]$ contains the interpolating vectors. Each column of $C$ indexed by $(i\mathbf{k},j\mathbf{k}^{\prime})$ is given by

[TABLE]

Eq. (3.2) is an over-determined linear system with respect to the interpolation vectors $\Theta$ . The least squares approximation to the solution is given by

[TABLE]

Due to the tensor product structure of $Z$ and $C$ , the matrix-matrix multiplications $ZC^{*}$ and $CC^{*}$ can be carried out efficiently [11], with computational cost being $\mathcal{O}(N_{g}N_{\mu}NN_{k})$ and $\mathcal{O}(N^{2}_{\mu}NN_{k})$ , respectively. The cost of inverting the matrix $CC^{*}$ is $\mathcal{O}(N_{\mu}^{3})$ , and the overall cost evaluating $\Theta$ is thus bounded by $\mathcal{O}(N_{g}N_{\mu}NN_{k}+N_{\mu}^{3}+N_{g}N_{\mu}^{2})$ . Hence the cost scales cubically with respect to the number of electrons in the unit cell, and linearly with respect to the number of $\mathbf{k}$ points.

Eq. (3.1) is the general form of ISDF. In the BSE calculations, we may further distinguish whether $i,j$ should take valence or conduction band indices only, as well as whether $\mathbf{k},\mathbf{k}^{\prime}$ can be set to be the same. For instance, Eq. (2.17) suggests that in order to compress $V_{A},V_{B}$ , we only need the following ISDF decomposition:

[TABLE]

Note that the number of columns of the matrix $Z^{V}$ is only $N_{v}N_{c}N_{k}$ , and the number of fitting functions $N^{V}_{\mu}$ can be chosen to be less than $N_{\mu}$ . The computation of $W_{A},W_{B}$ requires the general ISDF format (3.1).

The interpolations points $\{\hat{\mathbf{r}}_{\mu}\}_{\mu=1}^{N_{\mu}}$ can be chosen via a QR factorization with column pivoting (QRCP) method [8], with randomization to reduce the computational cost. We refer readers to [19, 20] for details of the randomized QRCP method for evaluating the interpolation points. Other methods can also be used as well to find the interpolation points as well, such as the method based on the centroidal Voronoi decomposition (CVT) [7].

4 Fast algorithm for applying the BSH to a vector

Once the ISDF decomposition is obtained, we may compute the following matrix elements

[TABLE]

and similarly

[TABLE]

The expressions in Eq. (2.17) can then be approximated in the ISDF format as

[TABLE]

In order to use the Fourier representation (2.33) and (2.36), we first need to perform Fourier transform for $\{\zeta_{\mu}^{V}\}$ and $\{\zeta_{\mu}\}$ . Using the fast Fourier transform (FFT), and assuming that the number of Fourier coefficients $\mathbf{G}$ is also $N_{g}$ , the computational cost for the Fourier transform scales as $\mathcal{O}(N_{\mu}^{V}N_{g}\log N_{g})$ and $\mathcal{O}(N_{\mu}N_{g}\log N_{g})$ , respectively. The Fourier coefficients $\hat{V}_{\mathbf{k}}$ can be obtained analytically, and we assume the coefficients $\hat{W}_{\mathbf{k}}$ are already provided from e.g. a GW calculation. The cost for computing $\widetilde{V}_{A},\widetilde{V}_{B}$ using Eq. (2.33) is then $\mathcal{O}((N_{\mu}^{V})^{2}N_{g})$ . Similarly the cost for computing all $\widetilde{W}_{\mathbf{q}}$ matrices is $\mathcal{O}(N_{\mu}^{2}N_{g}N_{k})$ . In particular, the total cost for the initial setup stage scales as $\mathcal{O}(N_{k})$ with respect to the number of $\mathbf{k}$ -points.

After this initial setup stage, each entry of the BSH can be computed with $\mathcal{O}((N_{\mu}^{V})^{2}+N_{\mu}^{2})$ operations. If the entire BSH matrix is to be constructed, the cost will be $\mathcal{O}(N_{\mu}^{2}N_{k}^{2}N_{v}^{2}N_{c}^{2})$ .

Below we demonstrate that if we only aim at applying the Hamiltonian $H_{\text{BSE}}$ to an arbitrary vector without ever assembling the full Hamiltonian, the computational cost can be greatly reduced.

For simplicity, let us focus on the case when the Tamm–Dancoff approximation (TDA) is used. Applying the Hamiltonian $H_{\text{BSE}}=D+2V_{A}-W_{B}$ to a vector $X\in\mathbb{C}^{N_{v}N_{c}N_{k}}$ amounts to evaluating the three terms

[TABLE]

Computing the first term for all $(i_{v}i_{c}\mathbf{k})$ clearly costs $\mathcal{O}(N_{v}N_{c}N_{k})$ operations. We now show that the second and third term can also be computed efficiently.

Using (4.3), the second term in (4.4) can be regrouped as

[TABLE]

This means that one can first perform contractions over $j_{v}$ , $j_{c}$ , and $\mathbf{k}^{\prime}$ to obtain a quantity which only depends on $\hat{\mathbf{r}}_{\nu}$ . The computational complexity is $\mathcal{O}(N_{\mu}^{V}(N_{v}N_{c}N_{k}+N_{c}N_{k}))$ . The two remaining sums can be computed with $\mathcal{O}((N_{\mu}^{V})^{2}+N_{\mu}^{V}N_{v}N_{c}N_{k})$ operations. The total complexity of computing $V_{A}X$ is bounded by $\mathcal{O}((N_{\mu}^{V})^{2}+N_{\mu}^{V}N_{v}N_{c}N_{k})$ .

For the third term in (4.4) we obtain

[TABLE]

Here, the two innermost contractions over $j_{v}$ and $j_{c}$ result in a quantity that only depends on $\mathbf{k}$ , $\hat{\mathbf{r}}_{\mu}$ , and $\hat{\mathbf{r}}_{\nu}$ . The cost for these two steps is $\mathcal{O}(N_{\mu}N_{k}N_{v}N_{c}+N_{\mu}^{2}N_{k}N_{c})$ . The sum over $\mathbf{k}^{\prime}$ has the structure of a discrete convolution, for each fixed $\mu\nu$ pair. Therefore it can be computed for all $\mathbf{k}$ simultaneously in $\mathcal{O}(N_{\mu}^{2}N_{k}\log N_{k})$ operations by fast convolution algorithms, e.g., by using FFT with zero-padded vectors. The remaining summation operations over $\mu$ and $\nu$ are then obtained with $\mathcal{O}(N_{\mu}^{2}N_{c}N_{k}+N_{\mu}N_{v}N_{c}N_{k})$ operations. In total the computation of $W_{A}X$ amounts to $\mathcal{O}(N_{\mu}N_{v}N_{c}N_{k}+N_{\mu}^{2}N_{c}N_{k}+N_{\mu}^{2}N_{k}\log N_{k})$ operations.

Combining the results for the three parts of the Hamiltonian, we see that the computational complexity is given by

[TABLE]

In particular, the cost with respect to the number of $\mathbf{k}$ points only scales as $\mathcal{O}(N_{k}\log N_{k})$ . This allows us to perform BSE calculations for complex materials which requires a very large number of $\mathbf{k}$ -points.

By avoiding the explicit construction of $H_{\text{BSE}}$ , the new algorithm also drastically reduces the storage cost. The storage cost for $H_{\text{BSE}}$ alone is $\mathcal{O}((N_{v}N_{c}N_{k})^{2})$ . In the new algorithm, the storage cost of $\hat{W}_{\mathbf{q}}$ becomes the dominant component and scales only linearly with respect to $N_{k}$ .

As an example, the matrix-free application of $H_{\text{BSE}}$ can be used to compute the optical absorption spectrum, which requires the evaluation of the following quantity

[TABLE]

Here $d_{r}$ and $d_{l}$ are called the right and left optical transition vectors, and $\eta$ is a broadening factor used to account for the exciton lifetime. We also compute the smallest eigenvalue of $H_{\text{BSE}}$ which are of interest in their own right, as they represent the transition energies of bound excitons in many semiconducting solid state materials.

To observe the absorption spectrum and identify its main peaks, it is possible to use a structure preserving iterative method instead of explicitly computing all eigenpairs of $H_{\text{BSE}}$ . We refer readers to Ref. [5, 33] for details of the structure preserving Lanczos algorithm, which has been implemented in the BSEPACK [34] library. When TDA is used, the structure preserving Lanczos reduces to a standard Lanczos algorithm. For the computation of the first eigenvalue we use standard ARPACK [14] routines for Hermitian matrices.

5 Numerical Examples

To illustrate the efficiency of ISDF for BSE calculations in crystals, we apply the method to compute the excitation modes and absorption spectra of a one-dimensional model problem as well as two real material systems, diamond (3D bulk) and graphene (quasi-2D). For both systems, we determine the optical absorption spectra on $\mathbf{k}$ -grids close to those employed in previously published calculations to demonstrate that our method is suitable for state-of-the-art calculations, both for 3D and quasi-2D materials. We furthermore provide a numerical scaling analysis and a more detailed analysis of the error in the ISDF in the case of the one-dimensional model and diamond. We show that a good approximation of the spectrum can be obtained with a small number of interpolation vectors.

The method was implemented in Julia [4] and the source code is available at github.com/fhenneke/BSE_k_ISDF.jl. As input to our method for the actual materials, we employ the KSDFT single-particle orbitals, quasi-particle energies and screened Coulomb potential computed by exciting [9, 36], an all-electron full-potential code with implementations of density functional theory and many-body perturbation theory. The Tamm–Dancoff approximation is used in all calculations.

All calculation for the proposed method were carried out on a single core of an i5-8250U CPU at 1.60GHz.

5.1 One-dimensional problems

For the one-dimensional problem, we take the single particle orbitals $\psi_{i\mathbf{k}}(\mathbf{r})$ in (2.16) to be eigenfunctions of a single particle Hamiltonian $\mathcal{H}(\mathbf{k})$ in which the effective potential is defined as

[TABLE]

where the unit cell size is $\lvert\Omega\rvert\equiv L=1.5$ .

The bare Coulomb potential used in (2.16) is chosen to be

[TABLE]

and the screened interactions is chosen as

[TABLE]

Compared to the smoothed out Coulomb potential $V$ , the chosen screened interaction $W$ decays exponentially and also contains lattice periodic contributions. The potentials are shown in Figure 5.1. Both potentials are periodically extended $N_{k}-1$ times outside of the unit cell. The particular structure of the potentials has an influence on the band structure and spectrum of the BSH, but was observed to not significantly impact the convergence behavior or the runtime scaling of the ISDF method.

The Bloch functions $u_{i\mathbf{k}}$ are sampled on $N_{g}=128$ uniformly distributed grid points within the unit cell, and the number of $\mathbf{k}$ points $N_{k}$ ranges from $16$ to $4096$ in our experiments.

For each $\mathbf{k}$ point, the first four eigenstates are treated as the valence states in this model, while the remaining eigenstates are considered as the conduction states, separated by an energy gap from the former. We use all $N_{v}=4$ valence bands and $N_{c}=5$ conduction bands to construct the approximate $H_{\textrm{BSE}}$ . The number of $\mathbf{k}$ points was chosen to be $N_{k}=256$ in the error analysis of the ISDF approximation, and varies from $16$ to $4096$ in the run time analysis and the analysis of the error in the absorption spectrum. The largest resulting Hamiltonian is of size $81920\times 81920$ .

Figure 5.2 shows how the ISDF approximation error varies with respect to the truncation parameter $N_{\mu}^{ij}$ and how the accuracy of the approximate spectrum of $H_{\textrm{BSE}}$ changes with respect to the ISDF approximation error.

In the left subfigure, we plot the relative error $\lVert\Theta^{\alpha\beta}C^{\alpha\beta}-Z^{\alpha\beta}\rVert_{F}/\lVert Z^{\alpha\beta}\rVert_{F}$ , $\alpha,\beta\in\{v,c\}$ , where $\|\cdot\|_{F}$ is the Frobenius norm, for different choices of truncation levels $N_{\mu}$ (or number of interpolation points). As expected, when $N_{\mu}$ is too small, ISDF results in relatively large error. As $N_{\mu}$ becomes slightly larger, the ISDF approximation error decays exponentially with respect to $N_{\mu}$ up to $N_{\mu}=20\sim 30$ . At this truncation level, the error is on the order of $10^{-8}$ , which is sufficiently small for obtaining an highly accurate approximation of the spectrum of $H_{\textrm{BSE}}$ as shown in the right subfigure. In this subfigure, we plot the relative error in the first eigenvalue and in the overall optical absorption spectrum against the ISDF error tolerance $Z_{\textrm{tol}}$ . For each $Z_{\textrm{tol}}$ , we choose the smallest truncation parameters $N_{\mu}$ ’s with the resulting error in $Z^{\alpha,\beta}$ being lesser or equal to $Z_{\textrm{tol}}$ for $\alpha,\beta\in\{v,c\}$ .

In Figure 5.3, we plot the timing measurements for both the construction of $\widetilde{V}$ and $\widetilde{W}$ and the multiplication of the approximate $H_{\textrm{BSE}}$ with a vector with respect to $N_{k}$ . In these calculations, the ISDF truncation parameters $N_{\mu}$ ’s are chosen so that the relative error in $Z^{\alpha\beta}$ is below $Z_{\textrm{tol}}=10^{-5}$ . This error tolerance resulted in the choices of $N_{\mu}^{vv}=17$ , $N_{\mu}^{cc}=23$ , and $N_{\mu}^{vc}=21$ .

As we can see in Figure 5.3, the scaling of the runtime for the construction of $\widetilde{V}$ and $\widetilde{W}$ is nearly linear with respect to $N_{k}$ , which is in excellent agreement with the theoretical computational complexity presented in the preceeding section. The scaling of the runtime for the multiplication of the approximate $H_{\textrm{BSE}}$ with a vector also looks linear in $N_{k}$ . In fact, a more detailed investigation showed that the convolutions in $\mathbf{k}$ in the application of $W$ dominate the cost of the matrix-vector multiplications, in good agreement with the theoretical $\mathcal{O}(N_{k}\log N_{k})$ complexity shown earlier.

For comparison, without the use of ISDF, the construction of $H_{\textrm{BSE}}$ is estimated to take about $460,000$ seconds for $N_{k}=4096$ . With our method it took less than $10$ seconds.

5.2 Three-dimensional problems

We now compare optical absorption spectra for diamond and graphene computed from the approximate $H_{\textrm{BSE}}$ constructed via ISDF with corresponding reference spectra. The reference spectra are obtained from the exact $H_{\textrm{BSE}}$ from the exciting code [9, 36]. The comparison is shown in Figure 5.4. The reference spectrum for diamond is constructed on a $13\times 13\times 13$ $\mathbf{k}$ -grid using all 4 valence and 10 conduction states. Fourier components $\hat{W}_{\mathbf{q}}(\mathbf{G},\mathbf{G}^{\prime})$ in Eq. (2.35) are calculated up to a cut-off $|\mathbf{G}+\mathbf{q}|\leq 2.5\;\mathrm{a}_{0}^{-1}$ , where $\mathrm{a}_{0}$ is the Bohr radius. The screened Coulomb interaction is calculated within the random-phase approximation (RPA) including 100 conduction states. For graphene, the reference spectrum is obtained on a $42\times 42\times 1$ $\mathbf{k}$ -grid using all 4 valence and 5 conduction states. Fourier components $\hat{W}_{\mathbf{q}}(\mathbf{G},\mathbf{G}^{\prime})$ in Eq. (2.35) are calculated up to a cut-off $|\mathbf{G}+\mathbf{q}|\leq 2.0\;\mathrm{a}_{0}^{-1}$ and 80 conduction states are included in the RPA calculations for the screened Coulomb potential. The numerical parameters of the reference and approximate calculations are shown in Table 1. The number of interpolation vectors was chosen such that the relative ISDF error was around $0.1$ .

We can clearly see that for both diamond and graphene, the approximate optical absorption spectrum matches well with the reference spectrum. In particular, the positions and heights of all major peaks are in good agreement. We should note that, in the case of diamond, the absorption spectrum produced by a $13\times 13\times 13$ $\mathbf{k}$ -grid is in good agreement with measurements [25] and previous BSE calculations [10]. In the case of graphene, however, larger $\mathbf{k}$ -grids have been reported for BSE calculations [37] to produce an optical absorption spectrum in good agreement with the experimental result.

Figure 5.5 shows that the ISDF approximation error can be systematically reduced as we increase the number interpolating vectors $N_{\mu}$ . However, Figure 5.4 shows that the approximate absorption spectrum is already in good agreement with the reference spectrum, when the relative ISDF approximation error is at $0.1$ . Thus, it seems unnecessary to use a larger number of interpolation vectors in these cases. This observation is corroborated by the relative difference between the first eigenvalue of the approximate $H_{\textrm{BSE}}$ computed using ARPACK and that of reference $H_{\textrm{BSE}}$ constructed in exciting shown in Table 2. With a relative ISDF approximation error of $Z_{tol}=0.1$ , the error in the first BSE eigenvalue is below $10\;\mathrm{meV}$ in both examples shown here.

To illustrate the run time scaling of the method in the 3D examples, we measure the time it takes to construct the approximate $H_{\textrm{BSE}}$ via ISDF as well as the time it takes to multiply the resulting $H_{\textrm{BSE}}$ with vectors for the diamond example. We use $\mathbf{k}$ -grids of sizes $N_{k}=n_{k}\times n_{k}\times n_{k}$ for $n_{k}\in\{2,3,4,5,7,9,13\}$ . The resulting timing measurements are plotted in Figure 5.6. It can be seen that the runtime for constructing the approximate $H_{\textrm{BSE}}$ scales linearly with the number of $\mathbf{k}$ -points. The multiplication of $H_{\textrm{BSE}}$ with vectors scales as $\mathcal{O}(N_{k}\log(N_{k}))$ for sufficiently large $N_{k}$ . As in the model problem, the convolutions in $\mathbf{k}$ in the application of $W$ dominate the cost of the matrix-vector multiplications. For comparison, computing the ISDF decomposition of the Hamiltonian for the case $N_{k}=13^{3}$ took $147$ seconds, whereas the full assembly of the Hamiltonian took about 6 hours in exciting on 13 compute nodes with 13 cores each. The optical absorption function was obtained by running about $150$ Lanczos steps, which amounts to about $24$ minutes for each fixed direction (x, y, and z), compared to almost 4 hours required in the exciting code for the full diagonalization on 13 compute nodes.

6 Conclusion

In this paper, we examined the possibility of using the ISDF technique to reduce the computational complexity of BSH construction and the subsequent iterative approximation of the optical absorption spectrum and excitation energies of electron-hole (exciton) pairs for solids. For periodic systems, a fine $\mathbf{k}$ -point sampling in the Brillouin zone is often required to produce accurate results, whereas the number of bands per $\mathbf{k}$ -point required to construct the bare exchange and screened direct kernels of the BSH is relatively small. We showed that the complexity of the ISDF procedure scales linearly with respect to the number of $\mathbf{k}$ points ( $N_{k}$ ) when the ranks of the approximate bare exchange and screened direct kernels produced by the ISDF procedure are chosen to be independent of $N_{k}$ . By keeping the bare exchange and screened direct kernels in the low-rank decomposed form produced by the ISDF procedure, an iterative method used to obtain the optical absorption spectrum and selected excitation energies (eigenvalues of the BSH) can be implemented with cost scaling as $\mathcal{O}(N_{k}\log N_{k})$ . Our numerical experiments, which were performed on a 1D model as well as two different types of actual materials (diamond and graphene), confirm our complexity analysis. They demonstrate that the ISDF technique can indeed significantly reduce the cost of BSE calculation for solids while maintaining the same accuracy provided by a standard BSE calculation implemented in the software exciting. Our current implementation of the ISDF technique is done using the Julia programming language for a single node. A distributed parallel implementation is needed to accommodate a much finer $\mathbf{k}$ -point sampling which is required in case of the graphene example to produce a computed absorption spectrum that matches with experimental results.

Acknowledgments

This work was partially supported by the Department of Energy under grant DE-SC0017867 (L.L.), by the Center for Computational Study of Excited-State Phenomena in Energy Materials (C2SEPEM) at the Lawrence Berkeley National Laboratory, which is funded by the U. S. Department of Energy, Office of Science, Basic Energy Sciences, Materials Sciences and Engineering Division, under Contract No. DE-AC02-05CH11231 (C.Y.), by the Scientific Discovery through Advanced Computing (SciDAC) program, and by the CAMERA program (L.L. and C.Y.). Within a framework cooperations between the University of California at Berkeley and Freie Universität Berlin, the latter sponsored an extended visit of F.H. and R.K. in Berkeley. We thank Wei Hu, Meiyue Shao and Kyle Thicke for helpful discussions. C.D. and R.K. thank IPAM, UCLA, for its support during the 2013 fall program on “Materials for a sustainable energy future” and for creating the inspiring scientific atmosphere that initiated their collaboration.

Bibliography37

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Stefan Albrecht, Giovanni Onida, and Lucia Reining , Ab initio calculation of the quasiparticle spectrum and excitonic effects in li 2 subscript li 2 {\mathrm{li}}_{2} o , Phys. Rev. B, 55 (1997), pp. 10278–10281.
2[2] Neil W. Ashcroft and David N. Mermin , Solid state physics , Harcourt, New York, 1976.
3[3] Peter Benner, Sergey Dolgov, Venera Khoromskaia, and Boris N. Khoromskij , Fast iterative solution of the Bethe–Salpeter eigenvalue problem using low-rank and QTT tensor approximation , J. Comput. Phys., 334 (2017), pp. 221–239.
4[4] J. Bezanson, A. Edelman, S. Karpinski, and V. Shah , Julia: A fresh approach to numerical computing , SIAM Review, 59 (2017), pp. 65–98.
5[5] J. Brabec, L. Lin, M. Shao, N. Govind, Y. Saad, C. Yang, and E. G. Ng , Efficient algorithms for estimating the absorption spectrum within linear response TDDFT , J. Chem. Theory Comput., 11 (2015), pp. 5197–5208.
6[6] J. Deslippe, G. Samsonidze, D. A. Strubbe, M. Jain, M. L. Cohen, and S. G. Louie , Berkeley GW: A massively parallel computer package for the calculation of the quasiparticle and optical properties of materials and nanostructures , Comput. Phys. Commun., 183 (2012), pp. 1269–1289.
7[7] K. Dong, W. Hu, and L. Lin , Interpolative separable density fitting through centroidal Voronoi tessellation with applications to hybrid functional electronic structure calculations , J. Chem. Theory Comput., 14 (2018), p. 1311.
8[8] G. H. Golub and C. F. Van Loan , Matrix computations , Johns Hopkins Univ. Press, Baltimore, fourth ed., 2013.