Towards optimal cosmological parameter recovery from compressed   bispectrum statistics

Joyce Byun; Alexander Eggemeier; Donough Regan; David Seery; Robert E.; Smith

arXiv:1705.04392·astro-ph.CO·July 19, 2017

Towards optimal cosmological parameter recovery from compressed bispectrum statistics

Joyce Byun, Alexander Eggemeier, Donough Regan, David Seery, Robert E., Smith

PDF

TL;DR

This paper explores compressed bispectrum statistics as proxies to improve cosmological parameter constraints from large-scale structure surveys, aiming to reduce covariance complexity while maintaining information.

Contribution

It demonstrates that modal bispectrum and other proxies can match the Fourier bispectrum's effectiveness with fewer configurations, simplifying analysis without significant information loss.

Findings

01

Modal bispectrum performs as well as Fourier bispectrum with fewer modes.

02

Adding bispectrum data improves bias and $\sigma_8$ constraints by up to 5%.

03

Parameter constraints can improve by up to 20% with bispectrum proxies.

Abstract

Over the next decade, improvements in cosmological parameter constraints will be driven by surveys of large-scale structure. Its inherent non-linearity suggests that significant information will be embedded in higher correlations beyond the two-point function. Extracting this information is extremely challenging: it requires accurate theoretical modelling and significant computational resources to estimate the covariance matrix describing correlations between different Fourier configurations. We investigate whether it is possible to reduce the covariance matrix without significant loss of information by using a proxy that aggregates the bispectrum over a subset of Fourier configurations. Specifically, we study the constraints on $Λ$ CDM parameters from combining the power spectrum with (a) the modal bispectrum decomposition, (b) the line correlation function and (c) the integrated…

Figures33

Click any figure to enlarge with its caption.

Tables4

Table 1. Table 1: Fiducial values of the cosmological parameters, together with the stepsize Δ θ Δ 𝜃 \Delta\theta used to vary each parameter in the simulations. We perform one simulation with offset + Δ θ Δ 𝜃 +\Delta\theta and one with increment − Δ θ Δ 𝜃 -\Delta\theta , giving two offset simulations per parameter. With seven parameters and four realizations per model this gives 4 + 2 × 7 × 4 = 60 4 2 7 4 60 4+2\times 7\times 4=60 simulations in the suite. The bias parameters are assumed to be b 1 = 1 subscript 𝑏 1 1 b_{1}=1 and b 2 = 0 subscript 𝑏 2 0 b_{2}=0 .

Parameter $θ$	$Ω_{m}$	$Ω_{b}$	$w_{0}$	$w_{a}$	$σ_{8}$	$n_{s}$	$h$
Fiducial value	$0.25$	$0.040$	$- 1.0$	$0.0$	$0.8$	$1.00$	$0.70$
$Δ θ$	$\pm 0.05$	$\pm 0.005$	$\pm 0.2$	$\pm 0.1$	$\pm 0.1$	$\pm 0.05$	$\pm 0.05$

Table 2. Table 2: Shell widths Δ k Δ 𝑘 \Delta k used to average estimators for the power spectrum and bispectrum (where used), together with minimum and maximum modes k min subscript 𝑘 min k_{\text{min}} , k max subscript 𝑘 max k_{\text{max}} and the total number of bins or measurements N bin subscript 𝑁 bin N_{\text{bin}} .

	$Δ k$ [ $h {Mpc}^{- 1}$ ]	$k_{min}$ [ $h {Mpc}^{- 1}$ ]	$k_{max}$ [ $h {Mpc}^{- 1}$ ]	$N_{bin}$
$P$	$0.010$	$0.004$	$0.300$	$30$
$B$	$0.034$	$0.004$	$0.302$	$95$
$β$	$-$	$0.004$	$0.302$	$50$
$i b$	$0.010$	$0.021$	$0.306$	$29$
$ℓ$	$-$	$0.016$	$0.314$	$30$

Table 3. Table 3: Marginalized 1 σ 1 𝜎 1\sigma parameter uncertainties for the power spectrum and its combination with a 3-point correlation measure, including CMB priors. All quoted values are derived from the measured covariance matrices and parameter derivatives with k max = 0.3 h Mpc − 1 subscript 𝑘 max 0.3 ℎ superscript Mpc 1 k_{\text{max}}=0.3\,h\,\text{Mpc}^{-1} . The percentages in parentheses refer to the improvement over the P 𝑃 P -only results.

	$P$	$P + B$		$P + β$		$P + β$		$P + ℓ$		$P + i b$
	$P$	$P + B$		$n_{max} = 50$		$n_{max} = 10$		$P + ℓ$		$P + i b$
$Ω_{m}$	$0.00179$	$0.00140$	$(22 %)$	$0.00141$	$(21 %)$	$0.00144$	$(19 %)$	$0.00172$	$(4 %)$	$0.00167$	$(7 %)$
$Ω_{b}$	$0.00015$	$0.00014$	$(5 %)$	$0.00014$	$(5 %)$	$0.00014$	$(4 %)$	$0.00015$	$(2 %)$	$0.00015$	$(1 %)$
$w_{0}$	$0.084$	$0.070$	$(16 %)$	$0.068$	$(19 %)$	$0.069$	$(17 %)$	$0.076$	$(9 %)$	$0.082$	$(2 %)$
$w_{a}$	$0.370$	$0.315$	$(15 %)$	$0.306$	$(17 %)$	$0.310$	$(16 %)$	$0.338$	$(9 %)$	$0.360$	$(3 %)$
$σ_{8}$	$0.0092$	$0.0023$	$(75 %)$	$0.0024$	$(74 %)$	$0.0025$	$(73 %)$	$0.0043$	$(53 %)$	$0.0090$	$(2 %)$
$n_{s}$	$0.00327$	$0.00284$	$(13 %)$	$0.00281$	$(14 %)$	$0.00284$	$(13 %)$	$0.00303$	$(7 %)$	$0.00323$	$(1 %)$
$h$	$0.00103$	$0.00087$	$(15 %)$	$0.00086$	$(16 %)$	$0.00087$	$(15 %)$	$0.00095$	$(7 %)$	$0.00101$	$(2 %)$
$b_{1}$	$0.0103$	$0.0020$	$(81 %)$	$0.0021$	$(79 %)$	$0.0022$	$(79 %)$	$0.0032$	$(68 %)$	$0.0100$	$(3 %)$
$b_{2}$	$0.0100$	$0.0031$	$(69 %)$	$0.0031$	$(69 %)$	$0.0031$	$(69 %)$	$0.0085$	$(15 %)$	$0.0100$	$(1 %)$

Table 4. Table 4: Percent improvement of unmarginalized constraints using P + B 𝑃 𝐵 P+B compared to P 𝑃 P only at z = 0 𝑧 0 z=0 .

$Ω_{M}$	$Ω_{B}$	$w_{0}$	$w_{a}$	$σ_{8}$	$n_{s}$	$h$	$b_{1}$	$b_{2}$
$12.9 %$	$19.4 %$	$26.0 %$	$27.0 %$	$26.4 %$	$15.1 %$	$15.6 %$	$42.4 %$	$43.4 %$

Equations150

⟨ δ (k_{1}) δ (k_{2})⟩

⟨ δ (k_{1}) δ (k_{2})⟩

⟨ δ (k_{1}) δ (k_{2}) δ (k_{3})⟩

i B (k) \equiv \int \frac{d ^{2} k ^}{4 π} ⟨ P (k, r_{L}) \overset{ˉ}{δ} (r_{L}) ⟩_{N_{s}} .

i B (k) \equiv \int \frac{d ^{2} k ^}{4 π} ⟨ P (k, r_{L}) \overset{ˉ}{δ} (r_{L}) ⟩_{N_{s}} .

\langle P(\bm{\mathrm{k}},\bm{\mathrm{r}}_{L})\bar{\delta}(\bm{\mathrm{r}}_{L})\rangle_{N_{s}}=\bigg{\langle}\bigg{[}P(\bm{\mathrm{k}})\big{|}_{\bar{\delta}=0}+\frac{\text{d}P(\bm{\mathrm{k}})}{\text{d}\bar{\delta}}\bigg{|}_{\bar{\delta}=0}\bar{\delta}(\bm{\mathrm{r}}_{L})+\cdots\bigg{]}\bar{\delta}(\bm{\mathrm{r}}_{L})\bigg{\rangle}_{N_{s}}\approx\frac{\text{d}\ln P(\bm{\mathrm{k}})}{\text{d}\bar{\delta}}\bigg{|}_{\bar{\delta}=0}P(\bm{\mathrm{k}})\sigma^{2}_{L}\ ,

\langle P(\bm{\mathrm{k}},\bm{\mathrm{r}}_{L})\bar{\delta}(\bm{\mathrm{r}}_{L})\rangle_{N_{s}}=\bigg{\langle}\bigg{[}P(\bm{\mathrm{k}})\big{|}_{\bar{\delta}=0}+\frac{\text{d}P(\bm{\mathrm{k}})}{\text{d}\bar{\delta}}\bigg{|}_{\bar{\delta}=0}\bar{\delta}(\bm{\mathrm{r}}_{L})+\cdots\bigg{]}\bar{\delta}(\bm{\mathrm{r}}_{L})\bigg{\rangle}_{N_{s}}\approx\frac{\text{d}\ln P(\bm{\mathrm{k}})}{\text{d}\bar{\delta}}\bigg{|}_{\bar{\delta}=0}P(\bm{\mathrm{k}})\sigma^{2}_{L}\ ,

ib (k) \equiv \frac{i B ( k )}{P ( k ) σ _{L}^{2}} \approx \frac{d ln P ( k )}{d δ ˉ}_{\overset{ˉ}{δ} = 0},

ib (k) \equiv \frac{i B ( k )}{P ( k ) σ _{L}^{2}} \approx \frac{d ln P ( k )}{d δ ˉ}_{\overset{ˉ}{δ} = 0},

ϵ_{r} (x) = \int \frac{d ^{3} k}{( 2 π ) ^{3}} ϵ (k) e^{i k \cdot x} W (k ∣ r) \equiv \int \frac{d ^{3} k}{( 2 π ) ^{3}} \frac{δ ( k )}{∣ δ ( k ) ∣} e^{i k \cdot x} W (k ∣ r),

ϵ_{r} (x) = \int \frac{d ^{3} k}{( 2 π ) ^{3}} ϵ (k) e^{i k \cdot x} W (k ∣ r) \equiv \int \frac{d ^{3} k}{( 2 π ) ^{3}} \frac{δ ( k )}{∣ δ ( k ) ∣} e^{i k \cdot x} W (k ∣ r),

\ell(r)\equiv\frac{V^{3}}{(2\pi)^{9}}\left(\frac{r^{3}}{V}\right)^{3/2}\int\frac{\text{d}^{2}\hat{r}}{4\pi}\big{\langle}\epsilon_{r}(\bm{\mathrm{x}})\epsilon_{r}(\bm{\mathrm{x}}+\bm{\mathrm{r}})\epsilon_{r}(\bm{\mathrm{x}}-\bm{\mathrm{r}})\big{\rangle},

\ell(r)\equiv\frac{V^{3}}{(2\pi)^{9}}\left(\frac{r^{3}}{V}\right)^{3/2}\int\frac{\text{d}^{2}\hat{r}}{4\pi}\big{\langle}\epsilon_{r}(\bm{\mathrm{x}})\epsilon_{r}(\bm{\mathrm{x}}+\bm{\mathrm{r}})\epsilon_{r}(\bm{\mathrm{x}}-\bm{\mathrm{r}})\big{\rangle},

B (k_{1}, k_{2}, k_{3}) \approx B_{modal} (k_{1}, k_{2}, k_{3}) \equiv \frac{1}{w ( k _{1} , k _{2} , k _{3} )} n = 0 \sum n_{max} - 1 β_{n}^{Q} Q_{n} (k_{1}, k_{2}, k_{3}),

B (k_{1}, k_{2}, k_{3}) \approx B_{modal} (k_{1}, k_{2}, k_{3}) \equiv \frac{1}{w ( k _{1} , k _{2} , k _{3} )} n = 0 \sum n_{max} - 1 β_{n}^{Q} Q_{n} (k_{1}, k_{2}, k_{3}),

B (k_{1}, k_{2}, k_{3}) \approx \frac{1}{w ( k _{1} , k _{2} , k _{3} )} n = 0 \sum n_{max} - 1 β_{n}^{Q} Q_{n} (k_{1}, k_{2}, k_{3}) = \frac{1}{w ( k _{1} , k _{2} , k _{3} )} n = 0 \sum n_{max} - 1 β_{n}^{R} R_{n} (k_{1}, k_{2}, k_{3}) .

B (k_{1}, k_{2}, k_{3}) \approx \frac{1}{w ( k _{1} , k _{2} , k _{3} )} n = 0 \sum n_{max} - 1 β_{n}^{Q} Q_{n} (k_{1}, k_{2}, k_{3}) = \frac{1}{w ( k _{1} , k _{2} , k _{3} )} n = 0 \sum n_{max} - 1 β_{n}^{R} R_{n} (k_{1}, k_{2}, k_{3}) .

Cov_{G} [P (k_{i}), P (k_{j})] \approx 1_{ij} \frac{2 k _{f}^{3}}{4 π k _{i}^{2} Δ k} P^{2} (k_{i}),

Cov_{G} [P (k_{i}), P (k_{j})] \approx 1_{ij} \frac{2 k _{f}^{3}}{4 π k _{i}^{2} Δ k} P^{2} (k_{i}),

Cov_{G} [B (k_{1}, k_{2}, k_{3}), B (q_{1}, q_{2}, q_{3})] \approx 1_{k, q} \frac{N π k _{f}^{3}}{k _{1} k _{2} k _{3} ( Δ k ) ^{3}} P (k_{1}) P (k_{2}) P (k_{3}) .

Cov_{G} [B (k_{1}, k_{2}, k_{3}), B (q_{1}, q_{2}, q_{3})] \approx 1_{k, q} \frac{N π k _{f}^{3}}{k _{1} k _{2} k _{3} ( Δ k ) ^{3}} P (k_{1}) P (k_{2}) P (k_{3}) .

δ (k, r_{L}) = \int \frac{d ^{3} q}{( 2 π ) ^{3}} δ (k - q) W_{L} (q) e^{- i q \cdot r_{L}},

δ (k, r_{L}) = \int \frac{d ^{3} q}{( 2 π ) ^{3}} δ (k - q) W_{L} (q) e^{- i q \cdot r_{L}},

i B^{theory} (k) = \frac{1}{V _{s}^{2}} \int \frac{d ^{2} k ^}{4 π} \int \frac{d ^{3} q _{1}}{( 2 π ) ^{3}} \int \frac{d ^{3} q _{2}}{( 2 π ) ^{3}} B^{theory} (k - q_{1}, - k + q_{1} + q_{2}, - q_{2}) W_{L} (q_{1}) W_{L} (- q_{1} - q_{2}) W_{L} (q_{2}) .

i B^{theory} (k) = \frac{1}{V _{s}^{2}} \int \frac{d ^{2} k ^}{4 π} \int \frac{d ^{3} q _{1}}{( 2 π ) ^{3}} \int \frac{d ^{3} q _{2}}{( 2 π ) ^{3}} B^{theory} (k - q_{1}, - k + q_{1} + q_{2}, - q_{2}) W_{L} (q_{1}) W_{L} (- q_{1} - q_{2}) W_{L} (q_{2}) .

\frac{d ln P ^{halo} ( k )}{d δ ˉ} = \frac{13}{21} \frac{d ln P ^{halo} ( k )}{d ln σ _{8}} + 2 - \frac{1}{3} \frac{d ln k ^{3} P ^{halo} ( k )}{d ln k},

\frac{d ln P ^{halo} ( k )}{d δ ˉ} = \frac{13}{21} \frac{d ln P ^{halo} ( k )}{d ln σ _{8}} + 2 - \frac{1}{3} \frac{d ln k ^{3} P ^{halo} ( k )}{d ln k},

\operatorname{\mathrm{Cov}_{\mathrm{G}}}\big{[}ib(k_{i}),ib(k_{j})\big{]}=\frac{V_{s}}{VN_{ks}}\frac{1}{\sigma_{L}^{2}}\bm{1}_{ij}.

\operatorname{\mathrm{Cov}_{\mathrm{G}}}\big{[}ib(k_{i}),ib(k_{j})\big{]}=\frac{V_{s}}{VN_{ks}}\frac{1}{\sigma_{L}^{2}}\bm{1}_{ij}.

\ell^{\text{theory}}(r)\simeq\Big{(}\frac{r}{4\pi}\Big{)}^{9/2}\iint\displaylimits_{\begin{subarray}{c}|\bm{\mathrm{k}}_{1}|,|\bm{\mathrm{k}}_{2}|,\\ |\bm{\mathrm{k}}_{1}+\bm{\mathrm{k}}_{2}|\leqslant 2\pi/r\end{subarray}}\text{d}^{3}k_{1}\,\text{d}^{3}k_{2}\;B^{\text{theory}}_{\epsilon}(k_{1},k_{2},k_{3})j_{0}\big{(}\left|\bm{\mathrm{k}}_{1}-\bm{\mathrm{k}}_{2}\right|r\big{)}\,,

\ell^{\text{theory}}(r)\simeq\Big{(}\frac{r}{4\pi}\Big{)}^{9/2}\iint\displaylimits_{\begin{subarray}{c}|\bm{\mathrm{k}}_{1}|,|\bm{\mathrm{k}}_{2}|,\\ |\bm{\mathrm{k}}_{1}+\bm{\mathrm{k}}_{2}|\leqslant 2\pi/r\end{subarray}}\text{d}^{3}k_{1}\,\text{d}^{3}k_{2}\;B^{\text{theory}}_{\epsilon}(k_{1},k_{2},k_{3})j_{0}\big{(}\left|\bm{\mathrm{k}}_{1}-\bm{\mathrm{k}}_{2}\right|r\big{)}\,,

B_{ϵ} (k_{1}, k_{2}, k_{3}) \equiv \frac{B ( k _{1} , k _{2} , k _{3} )}{P ( k _{1} ) P ( k _{2} ) P ( k _{3} )}

B_{ϵ} (k_{1}, k_{2}, k_{3}) \equiv \frac{B ( k _{1} , k _{2} , k _{3} )}{P ( k _{1} ) P ( k _{2} ) P ( k _{3} )}

\begin{split}\ell^{\text{tree}}(r)=\mbox{}&16\pi^{2}\Big{(}\frac{r}{4\pi}\Big{)}^{9/2}\int_{0}^{\frac{2\pi}{r}}\text{d}k_{1}\,k_{1}^{2}\int_{0}^{\frac{2\pi}{r}}\text{d}k_{2}\,k_{2}^{2}\int_{-1}^{\mu_{\mathrm{cut}}}\text{d}\mu\,F_{2}^{(s)}(k_{1},\,k_{2},\,\mu)\,\sqrt{\frac{P^{\text{tree}}(k_{1})P^{\text{tree}}(k_{2})}{P^{\text{tree}}(|\bm{\mathrm{k}}_{1}+\bm{\mathrm{k}}_{2}|)}}\,\\ &\mbox{}\times\left[j_{0}{\big{(}\left|\bm{\mathrm{k}}_{2}-\bm{\mathrm{k}}_{1}\right|r\big{)}}+2j_{0}{\big{(}\left|\bm{\mathrm{k}}_{1}+2\bm{\mathrm{k}}_{2}\right|r\big{)}}\right]\,,\end{split}

\begin{split}\ell^{\text{tree}}(r)=\mbox{}&16\pi^{2}\Big{(}\frac{r}{4\pi}\Big{)}^{9/2}\int_{0}^{\frac{2\pi}{r}}\text{d}k_{1}\,k_{1}^{2}\int_{0}^{\frac{2\pi}{r}}\text{d}k_{2}\,k_{2}^{2}\int_{-1}^{\mu_{\mathrm{cut}}}\text{d}\mu\,F_{2}^{(s)}(k_{1},\,k_{2},\,\mu)\,\sqrt{\frac{P^{\text{tree}}(k_{1})P^{\text{tree}}(k_{2})}{P^{\text{tree}}(|\bm{\mathrm{k}}_{1}+\bm{\mathrm{k}}_{2}|)}}\,\\ &\mbox{}\times\left[j_{0}{\big{(}\left|\bm{\mathrm{k}}_{2}-\bm{\mathrm{k}}_{1}\right|r\big{)}}+2j_{0}{\big{(}\left|\bm{\mathrm{k}}_{1}+2\bm{\mathrm{k}}_{2}\right|r\big{)}}\right]\,,\end{split}

μ_{cut} = min {1, max {- 1, \frac{( 2 π / r ) ^{2} - k _{1}^{2} - k _{2}^{2}}{2 k _{1} k _{2}}}} .

μ_{cut} = min {1, max {- 1, \frac{( 2 π / r ) ^{2} - k _{1}^{2} - k _{2}^{2}}{2 k _{1} k _{2}}}} .

⟨ ϵ (k_{1}) ϵ (k_{2})⟩ = \frac{( 2 π ) ^{3}}{V} δ_{D} (k_{1} + k_{2}) .

⟨ ϵ (k_{1}) ϵ (k_{2})⟩ = \frac{( 2 π ) ^{3}}{V} δ_{D} (k_{1} + k_{2}) .

\operatorname{\mathrm{Cov}_{\mathrm{G}}}\big{[}\ell(r_{i}),\ell(r_{j})\big{]}=\frac{(r_{i}r_{j})^{9/2}}{V^{3}}\iint\displaylimits_{\begin{subarray}{c}|\bm{\mathrm{k}}_{1}|,|\bm{\mathrm{k}}_{2}|,\\ |\bm{\mathrm{k}}_{1}+\bm{\mathrm{k}}_{2}|\leqslant 2\pi/r\end{subarray}}\frac{\text{d}^{3}k_{1}}{k_{\mathrm{f}}^{3}}\,\frac{\text{d}^{3}k_{2}}{k_{\mathrm{f}}^{3}}\Big{(}j_{0}(|2\bm{\mathrm{k}}_{1}+\bm{\mathrm{k}}_{2}|\,r_{i})\big{[}2j_{0}(|\bm{\mathrm{k}}_{1}-\bm{\mathrm{k}}_{2}|\,r_{j})+j_{0}(|2\bm{\mathrm{k}}_{1}+\bm{\mathrm{k}}_{2}|\,r_{j})\big{]}+r_{i}\leftrightarrow r_{j}\Big{)},

\operatorname{\mathrm{Cov}_{\mathrm{G}}}\big{[}\ell(r_{i}),\ell(r_{j})\big{]}=\frac{(r_{i}r_{j})^{9/2}}{V^{3}}\iint\displaylimits_{\begin{subarray}{c}|\bm{\mathrm{k}}_{1}|,|\bm{\mathrm{k}}_{2}|,\\ |\bm{\mathrm{k}}_{1}+\bm{\mathrm{k}}_{2}|\leqslant 2\pi/r\end{subarray}}\frac{\text{d}^{3}k_{1}}{k_{\mathrm{f}}^{3}}\,\frac{\text{d}^{3}k_{2}}{k_{\mathrm{f}}^{3}}\Big{(}j_{0}(|2\bm{\mathrm{k}}_{1}+\bm{\mathrm{k}}_{2}|\,r_{i})\big{[}2j_{0}(|\bm{\mathrm{k}}_{1}-\bm{\mathrm{k}}_{2}|\,r_{j})+j_{0}(|2\bm{\mathrm{k}}_{1}+\bm{\mathrm{k}}_{2}|\,r_{j})\big{]}+r_{i}\leftrightarrow r_{j}\Big{)},

6 (\frac{S}{N})_{Q_{n}}^{2} = \int \frac{d ^{3} k _{1}}{( 2 π ) ^{3}} \frac{d ^{3} k _{2}}{( 2 π ) ^{3}} \frac{d ^{3} k _{3}}{( 2 π ) ^{3}} (2 π)^{3} \frac{δ _{D} ( k _{1} + k _{2} + k _{3} )}{w ( k _{1} , k _{2} , k _{3} ) ^{2}} \frac{Q _{n} ( k _{1} , k _{2} , k _{3} ) ^{2}}{P ( k _{1} ) P ( k _{2} ) P ( k _{3} )} .

6 (\frac{S}{N})_{Q_{n}}^{2} = \int \frac{d ^{3} k _{1}}{( 2 π ) ^{3}} \frac{d ^{3} k _{2}}{( 2 π ) ^{3}} \frac{d ^{3} k _{3}}{( 2 π ) ^{3}} (2 π)^{3} \frac{δ _{D} ( k _{1} + k _{2} + k _{3} )}{w ( k _{1} , k _{2} , k _{3} ) ^{2}} \frac{Q _{n} ( k _{1} , k _{2} , k _{3} ) ^{2}}{P ( k _{1} ) P ( k _{2} ) P ( k _{3} )} .

w (k_{1}, k_{2}, k_{3}) = \frac{k _{1} k _{2} k _{3}}{P ( k _{1} ) P ( k _{2} ) P ( k _{3} )},

w (k_{1}, k_{2}, k_{3}) = \frac{k _{1} k _{2} k _{3}}{P ( k _{1} ) P ( k _{2} ) P ( k _{3} )},

6 (\frac{S}{N})_{Q_{n}}^{2} = ⟨ ⟨ Q_{n} ∣ Q_{n} ⟩ ⟩ .

6 (\frac{S}{N})_{Q_{n}}^{2} = ⟨ ⟨ Q_{n} ∣ Q_{n} ⟩ ⟩ .

⟨ ⟨ f ∣ g ⟩ ⟩ \equiv \int \frac{d ^{3} k _{1}}{( 2 π ) ^{3}} \frac{d ^{3} k _{2}}{( 2 π ) ^{3}} \frac{d ^{3} k _{3}}{( 2 π ) ^{3}} (2 π)^{3} δ_{D} (k_{1} + k_{2} + k_{3}) \frac{f ( k _{1} , k _{2} , k _{3} ) g ( k _{1} , k _{2} , k _{3} )}{k _{1} k _{2} k _{3}}

⟨ ⟨ f ∣ g ⟩ ⟩ \equiv \int \frac{d ^{3} k _{1}}{( 2 π ) ^{3}} \frac{d ^{3} k _{2}}{( 2 π ) ^{3}} \frac{d ^{3} k _{3}}{( 2 π ) ^{3}} (2 π)^{3} δ_{D} (k_{1} + k_{2} + k_{3}) \frac{f ( k _{1} , k _{2} , k _{3} ) g ( k _{1} , k _{2} , k _{3} )}{k _{1} k _{2} k _{3}}

⟨ ⟨ f ∣ g ⟩ ⟩ \equiv \frac{1}{8 π ^{4}} \int_{V} d k_{1} d k_{2} d k_{3} f (k_{1}, k_{2}, k_{3}) g (k_{1}, k_{2}, k_{3}) .

⟨ ⟨ f ∣ g ⟩ ⟩ \equiv \frac{1}{8 π ^{4}} \int_{V} d k_{1} d k_{2} d k_{3} f (k_{1}, k_{2}, k_{3}) g (k_{1}, k_{2}, k_{3}) .

⟨ ⟨ Q_{m} ∣ Q_{n} ⟩ ⟩ \equiv γ_{mn} \equiv \frac{( k _{max} - k _{min} ) ^{3}}{8 π ^{4}} \overset{γ}{ˉ}_{mn} .

⟨ ⟨ Q_{m} ∣ Q_{n} ⟩ ⟩ \equiv γ_{mn} \equiv \frac{( k _{max} - k _{min} ) ^{3}}{8 π ^{4}} \overset{γ}{ˉ}_{mn} .

⟨ ⟨ R_{m} ∣ R_{n} ⟩ ⟩ = \frac{( k _{max} - k _{min} ) ^{3}}{8 π ^{4}} 1_{mn} .

⟨ ⟨ R_{m} ∣ R_{n} ⟩ ⟩ = \frac{( k _{max} - k _{min} ) ^{3}}{8 π ^{4}} 1_{mn} .

⟨ ⟨ w B^{theory} ∣ Q_{m} ⟩ ⟩ = n = 0 \sum n_{max} - 1 β_{n}^{Q, theory} γ_{nm} .

⟨ ⟨ w B^{theory} ∣ Q_{m} ⟩ ⟩ = n = 0 \sum n_{max} - 1 β_{n}^{Q, theory} γ_{nm} .

β_{n}^{Q, theory} = \frac{1}{8 π ^{4}} m \sum γ_{nm}^{- 1} \int_{V} d k_{1} d k_{2} d k_{3} k_{1} k_{2} k_{3} B_{ϵ}^{theory} (k_{1}, k_{2}, k_{3}) Q_{m} (k_{1}, k_{2}, k_{3}),

β_{n}^{Q, theory} = \frac{1}{8 π ^{4}} m \sum γ_{nm}^{- 1} \int_{V} d k_{1} d k_{2} d k_{3} k_{1} k_{2} k_{3} B_{ϵ}^{theory} (k_{1}, k_{2}, k_{3}) Q_{m} (k_{1}, k_{2}, k_{3}),

⟨ β_{m}^{R} β_{n}^{R} ⟩ = (2 π)^{3} δ (0) \frac{6}{V ^{2}} \frac{( 8 π ^{4} ) ^{2}}{( k _{max} - k _{min} ) ^{6}} \int \frac{d ^{3} k _{1} d ^{3} k _{2} d ^{3} k _{3}}{( 2 π ) ^{9}} (2 π)^{3} δ (k_{1} + k_{2} + k_{3}) \frac{R _{m} ( k _{1} , k _{2} , k _{3} ) R _{n} ( k _{1} , k _{2} , k _{3} )}{k _{1} k _{2} k _{3}}, = \frac{6}{V} \frac{( 8 π ^{4} ) ^{2}}{( k _{max} - k _{min} ) ^{6}} ⟨ ⟨ R_{m} ∣ R_{n} ⟩ ⟩ .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Towards optimal cosmological parameter recovery

from compressed bispectrum statistics

Joyce Byun, Alexander Eggemeier, Donough Regan,

David Seery & Robert E. Smith

Astronomy Centre, School of Mathematical and Physical Sciences, University of Sussex, Brighton BN1 9QH, United Kingdom [email protected]@[email protected]@[email protected]

(Accepted XXX. Received YYY; in original form ZZZ)

Abstract

Over the next decade, improvements in cosmological parameter constraints will be driven by surveys of large-scale structure in the Universe. The information they contain is encoded in a hierarchy of correlation functions, and tools to utilize the two-point function are already well-developed. But the inherent non-linearity of large-scale structure suggests that further information will be embedded in higher correlations, of which the bispectrum is currently the most accessible. Extracting this information is extremely challenging: it requires accurate theoretical modelling and significant computational resources to estimate the covariance matrix describing correlations between different configurations of Fourier modes. We investigate whether it is possible to reduce the covariance matrix without significant loss of information by using a proxy that aggregates the bispectrum over a subset of Fourier configurations. Specifically, we study the constraints on $\Lambda$ CDM parameters from combining the power spectrum with (a) the modal decomposition of the bispectrum, (b) the line correlation function and (c) the integrated bispectrum. We forecast the error bars achievable on $\Lambda$ CDM parameters in a future galaxy survey that measures one of these proxies and compare them to those obtained from measurements of the Fourier bispectrum, including simple estimates of their degradation in the presence of shot noise. Our results demonstrate that the modal bispectrum performs as well as the Fourier bispectrum, even with considerably fewer modes than Fourier configurations. The line correlation function has good performance but does not match the modal bispectrum. The integrated bispectrum is comparatively insensitive to changes in the background cosmology. We find that the addition of bispectrum data can improve constraints on bias parameters and the normalization $\sigma_{8}$ by a factor between 3 and 5 compared to power spectrum measurements alone. For other parameters, improvements of up to $\sim$ 20% are possible. Finally, we use a range of theoretical models to explore how the sophistication required for realistic predictions varies with each proxy.

keywords:

Cosmology: theory, Large-scale structure of the Universe

††pubyear: 2017††pagerange: Towards optimal cosmological parameter recovery from compressed bispectrum statistics–LABEL:lastpage

1 Introduction

Constraints on cosmological parameters have improved significantly over the last two decades, driven by high-precision data from the cosmic microwave background (‘CMB’) temperature and polarization anisotropies (Bennett et al., 2003; Ade et al., 2014). But the capacity of CMB observations to sustain this rate of progress is now nearly exhausted. Measurements of the temperature anisotropy have become limited by cosmic variance down to very small scales, and therefore future large-scale measurements will furnish little new information. Meanwhile, on small scales, cosmological information begins to be erased by astrophysical processes. Modest improvements may still come from better polarization data, perhaps shrinking current uncertainties by a factor of a few, but eventually these measurements will also approach the limit of cosmic variance. Further progress will be possible only with new sources of information. In the decade 2020–2030 we expect such a source to be provided by surveys of cosmological large-scale structure—but only if the information these surveys contain can be extracted and understood (Silk, 2016).

**The bispectrum: challenges.—**The statistical information contained in a galaxy survey is carried by its hierarchy of correlation functions, of which typically only a few lowest-order functions can be measured accurately. Tools to extract information from the two-point function were developed early and are now mature. The development of tools to extract information from higher-order correlation functions has proceeded more slowly (Fry, 1984; Goroff et al., 1986; Scoccimarro, 2000; Sefusatti et al., 2006), but because structure formation is non-linear it is likely that these carry an important fraction of the information content. To make good use of our investment in costly observational programmes it will be necessary to find a means of using information from at least the three-point function.

What are the challenges? A first difficulty arises from combinatorics. We write the matter overdensity at time $t$ as $\delta(\bm{\mathrm{x}},t)=\delta\rho(\bm{\mathrm{x}},t)/\bar{\rho}(t)$ , where $\delta\rho(\bm{\mathrm{x}},t)=\rho(\bm{\mathrm{x}},t)-\bar{\rho}(t)$ is the density perturbation and $\rho(t)$ is the uniform background. Allowing angle brackets $\langle\cdots\rangle$ to denote an ensemble average, statistical homogeneity makes the two- and three-point functions $\langle\delta(\bm{\mathrm{x}})\delta(\bm{\mathrm{x}}+\bm{\mathrm{r}})\rangle$ and $\langle\delta(\bm{\mathrm{x}})\delta(\bm{\mathrm{x}}+\bm{\mathrm{r}}_{1})\delta(\bm{\mathrm{x}}+\bm{\mathrm{r}}_{2})\rangle$ independent of the origin $\bm{\mathrm{x}}$ . After translation to Fourier space this enforces conservation of momentum for the wavenumbers that participate in the expectation value,

[TABLE]

where $k=|\bm{\mathrm{k}}_{1}|=|\bm{\mathrm{k}}_{2}|$ is the common magnitude of the wavenumbers appearing in the two-point function. In Equations (1a)–(1b) and the remainder of this paper we suppress the time $t$ labelling the hypersurface of evaluation. Isotropy makes the power spectrum $P$ a function only of $k$ , while the bispectrum $B$ is a function of the three wavenumbers $k_{1}$ , $k_{2}$ , $k_{3}$ subject to the closure condition $\bm{\mathrm{k}}_{1}+\bm{\mathrm{k}}_{2}+\bm{\mathrm{k}}_{3}=0$ . Therefore a fixed volume of space yields many more distinct configurations of the bispectrum than of the spectrum. If we choose to measure all of them then we must provide an estimate for their covariance, and beyond the Gaussian approximation this typically requires N-body simulations. Since we require at least as many simulations as the number of independent covariances, the number of simulations to be performed grows at least linearly in the number of configurations. This makes it very expensive to use more than a fraction of the available bispectrum measurements.

Second, we must estimate typical values for $B(k_{1},k_{2},k_{3})$ in a particular cosmological model. While such estimates are already necessary for the power spectrum $P(k)$ , accurate estimates for the bispectrum are substantially more challenging. There are two key reasons. No matter what methods we use, the algebraic complexity associated with high-order correlation functions is usually worse than at lower order. Also, many of our standard tools have a reduced range of validity as we move up the correlation hierarchy. We must therefore work harder to obtain trustworthy predictions from our models, and in some cases we can do so only by giving up analytic methods altogether.

These problems have hampered the development of a toolkit that would make use of bispectrum measurements routine. Nevertheless, they are difficulties of practice and not obstructions of principle—if necessary, we could determine both covariances and typical values of $P$ or $B$ from N-body simulations, at least over a certain range of scales. But such determinations would require a very large number of realizations. The sheer computational resource entailed by this strategy makes it unattractive on timescales of interest for surveys such Euclid, DESI, or LSST.

**Alternative strategies.—**To build a practical methodology we must cut the size of the covariance matrices and avoid simulations where possible. Simulations are not needed when analytic methods suffice to predict $P$ or $B$ , or when a Gaussian approximation to the covariance is acceptable. Meanwhile, an obvious way to reduce the number of configurations is simply not to measure them all. Depending how aggressively we choose to cut, this may mean accepting a significant loss of information. A more nuanced option is to aggregate groups of configurations into weighted averages, effectively compressing the data carried by the bispectrum rather than discarding it. Such averages could be computed directly. But there are also observables whose statistics can naturally be expressed as weighted averages of this kind. Measuring these will often be simpler than measuring amplitudes of the Fourier bispectrum—simultaneously reducing the effort required to estimate and invert their covariance matrices. We describe these observables as ‘proxies’ or ‘proxy statistics’ for the full Fourier bispectrum.

Each proxy represents a compromise between (a) information loss due to compression, (b) the type of Fourier configurations over which it aggregates, and therefore the physics to which it is sensitive, and (c) its accessibility to analytical modelling, either for covariances or to estimate typical measurements. In this paper we select three proxies that have already been described in the literature and characterize their performance in each of these categories. Our aim is not to find an optimal proxy for any particular measurement, but rather to demonstrate that their use represents a feasible strategy for upcoming surveys without unacceptable degradation in information recovery.

**Summary.—**Our principal results are forecasts for the parameter error bars achievable from combinations of the galaxy power spectrum and bispectrum, or its proxies. The parameter set we study comprises the background quantities of a $\Lambda$ CDM model with evolving dark energy, supplemented by two parameters describing the bias model (McDonald & Roy, 2009). We study how these forecasts change when they are estimated using the complete non-Gaussian covariance matrix or its Gaussian approximation. We characterize their dependence on the method used to predict typical values for $P(k)$ and $B(k_{1},k_{2},k_{3})$ by sampling the results using tree-level and one-loop standard perturbation theory (‘SPT’), and an implementation of the halo model. We compare these estimates with values measured directly from simulations. These results can be used to determine, for each observable, the degree of modelling sophistication that is required to obtain accurate forecasts.

Our analysis does not include the effect of survey geometry or incompleteness, or redshift-space effects, and should be regarded as a determination of the performance of each proxy under idealized conditions. We include a simple analysis that indicates how our results would change in the presence of shot noise.

Fisher forecasts including Fourier bispectrum measurements have previously been reported by Sefusatti et al. (2006), assuming $1{,}015$ bispectrum configurations and measuring covariances from a suite of $6{,}000$ mock catalogues generated by the PTHalos algorithm (Scoccimarro & Sheth, 2002) and second-order Lagrangian perturbation theory (‘2LPT’). Their results suggested that the bispectrum contains significant cosmological information. For comparison, in our analysis we use $95$ bispectrum configurations in order to keep the size of the covariance matrix within plausible bounds, and measure it directly from a suite of full N-body simulations.

More recently, Chan & Blot (2016) estimated the extra constraining power of Fourier bispectrum measurements by computing their contribution to the signal-to-noise, but did not make forecasts for error bars on cosmological parameters. They found that the bispectrum contributed up to a $\sim 30\%$ increase in signal-to-noise above the power spectrum and concluded that the information gain would be modest, perhaps being principally useful to break degeneracies. One of our aims is to clarify the relationship between this conclusion and the more nuanced outcomes found by Sefusatti et al. (2006). We find that estimates based on signal-to-noise alone generally give only a rough indication compared to the full Fisher calculation because they do not account for variations in the sensitivity to background cosmology between observables.

**Organization.—**Our presentation is organized as follows. In Section 2 we introduce the three bispectrum proxies to be studied in the remainder of the paper. These are: (a) the modal bispectrum, which can be regarded as an alternative to the Fourier bispectrum obtained by exchanging the Fourier modes $\mathrm{e}^{\mathrm{i}\bm{\mathrm{k}}\cdot\bm{\mathrm{x}}}$ for an alternative basis (Fergusson et al., 2012; Regan et al., 2012); (b) the line correlation function, which samples three-point statistics of the phase of the density fluctuation (Obreschkow et al., 2013; Wolstenhulme et al., 2015), and (c) the integrated bispectrum (Chiang et al., 2014), which measures variation of the power spectrum in subsampled regions. Each of these measures can be expressed as a weighted average over particular configurations of the Fourier bispectrum.

In Sections 3.1–3.3 we explain how each proxy can be predicted using the halo model or a flavour of SPT. In Section 3.4 we explain our prescription to obtain the biased galaxy density field from the underlying matter density field, which is the quantity predicted by these analytic models. In Section 4 we describe our procedure to recover estimates for each proxy statistic from N-body simulations, and in Section 5 we compare these estimates (and estimates for their deriatives with respect to the cosmological parameters) with theoretical predictions. Readers familiar with the measures of 3-point correlations described in Section 2 and the modelling technologies of Section 3 may choose to begin reading at this point. In Section 6 we present signal-to-noise estimates for the information content of each proxy. Our Fisher forecasts appear in Section 7. In Section 8 we collect a number of topics for discussion, including the compression efficiency of each proxy statistic and the impact of shot noise on our forecasts. We conclude in Section 9.

**Notation.—**Our Fourier convention is $f(\bm{\mathrm{x}})=\int\text{d}^{3}k\,(2\pi)^{-3}f(\bm{\mathrm{k}})\mathrm{e}^{\mathrm{i}\bm{\mathrm{k}}\cdot\bm{\mathrm{x}}}$ . To avoid confusion we distinguish the Dirac $\delta$ -function $\delta_{\text{D}}(\bm{\mathrm{x}})$ or $\delta_{\text{D}}(\bm{\mathrm{k}})$ and the Kronecker symbol $\bm{1}_{ij}$ from the matter overdensity $\delta\equiv\delta\rho/\rho$ .

2 The Fourier bispectrum and its proxies

In this section we introduce the proxy statistics to which we compare the Fourier bispectrum. This has already been defined—together with the power spectrum—in Equations (1a)–(1b). We describe the integrated bispectrum in Section 2.1, the line correlation function in Section 2.2 and the modal decomposition of the bispectrum in Section 2.3. Each of these represents a possible compression of the Fourier bispectrum, in the sense described in Section 1.

2.1 Integrated bispectrum

The integrated bispectrum (or ‘position-dependent power spectrum’) was developed by Chiang et al. (2014) as a tool to search for primordial non-Gaussianity in large-scale structure. It has several convenient features: it is easily estimated using standard power-spectrum codes and it has a clear physical interpretation. As we shall see in Section 3.1, it represents a weighted average of the Fourier bispectrum dominated by ‘squeezed’ configurations—that is, wavenumbers $(\bm{\mathrm{k}}_{1},\bm{\mathrm{k}}_{2},\bm{\mathrm{k}}_{3})$ where one $k_{i}$ is much smaller than the other two. If we assume $k_{3}\ll k_{1},k_{2}$ then the bispectrum $\langle\delta(\bm{\mathrm{k}}_{1})\delta(\bm{\mathrm{k}}_{2})\delta(\bm{\mathrm{k}}_{3})\rangle$ expresses correlations between a single long-wavelength mode $\delta(\bm{\mathrm{k}}_{3})$ and the two-point function $\langle\delta(\bm{\mathrm{k}}_{1})\delta(\bm{\mathrm{k}}_{2})\rangle$ . This makes it sensitive to ‘local-type’ non-Gaussianity produced by inflationary models with more than one active field. However, because gravitational collapse correlates modes with comparable wavenumbers, the bispectrum produced during mass assembly is typically concentrated away from squeezed configurations. For this reason it is not clear how sensitive the integrated bispectrum might be to the cosmological parameters that influence this assembly process.

To define the integrated bispectrum divide the total survey volume into $N_{s}$ cubic subvolumes, each of volume $V_{s}\equiv L_{s}^{3}$ and centred at positions $\bm{\mathrm{r}}_{L}$ . Compute the power spectrum and average overdensity for each subvolume, which we denote $P(\bm{\mathrm{k}},\bm{\mathrm{r}}_{L})$ and $\bar{\delta}(\bm{\mathrm{r}}_{L})$ , respectively. (The power spectrum $P(\bm{\mathrm{k}},\bm{\mathrm{r}}_{L})$ may depend on the orientation of $\bm{\mathrm{k}}$ if the subvolumes are not isotropic.) Finally, the integrated bispectrum is defined to be the expectation of $P(\bm{\mathrm{k}},\bm{\mathrm{r}}_{L})\bar{\delta}(\bm{\mathrm{r}}_{L})$ , averaged over the orientation of $\bm{\mathrm{k}}$ ,

[TABLE]

The notation $\langle\cdots\rangle_{N_{s}}$ indicates that the expectation is to be taken over all subvolumes.

To compute this expectation we Taylor expand $P(\bm{\mathrm{k}},\bm{\mathrm{r}}_{L})$ in powers of $\bar{\delta}(\bm{\mathrm{r}}_{L})$ (Chiang et al., 2014). The leading contribution is

[TABLE]

where $\sigma^{2}_{L}\equiv\langle\bar{\delta}^{2}(\bm{\mathrm{r}}_{L})\rangle_{N_{s}}$ is the variance in mean overdensity over the subvolumes. Therefore, at lowest order, the integrated bispectrum describes variation of the power spectrum in response to changes in the large-scale overdensity.111In field theory this is the ‘operator product expansion’. We conclude that measurements of $iB$ contain both the power spectrum and its variance. Since these can be measured directly, any new information contained in the integrated bispectrum must reside in its normalized component (Chiang et al., 2014),

[TABLE]

where the second approximate equality applies when only the lowest-order contribution from the Taylor expansion need be retained. This is the linear response approximation. The quantity $\text{d}\ln P(k)/\text{d}\bar{\delta}$ is the linear response function and provides a good approximation to $ib$ for large $k$ .

2.2 Line Correlation Function

Equation (1a) shows that the power spectrum is sensitive only to information carried by the amplitude of each Fourier mode. In contrast, higher-order statistics generally encode information carried by both amplitudes and phases. Phase correlations are an exclusive signature of non-Gaussian density fields. For instance, they may arise through processes in the primordial Universe or from mode coupling in the non-linear regime of gravitational collapse. Therefore, unlike the amplitudes, phases directly probe cosmological information that is absent from the two-point function.

With this motivation, Obreschkow et al. (2013) proposed the line correlation function (often abbreviated as ‘LCF’). It measures a subset of three-point phase correlations of the density field—specifically, correlations between collinear points, each separated by a distance $r$ . Obreschkow et al. (2013) demonstrated that the LCF is a robust tracer of filamentary structures, and showed that it could be used as a phenomenological tool to distinguish between cold and warm dark matter scenarios. Subsequent work established its connection to conventional higher-order statistics (Wolstenhulme et al., 2015; Eggemeier et al., 2015; Eggemeier & Smith, 2017).

The line correlation function can be understood as follows: for a given density field $\delta(\bm{\mathrm{x}})$ in some volume $V$ , its real-space phase field $\epsilon_{r}(\bm{\mathrm{x}})$ smoothed on a scale $r$ satisfies

[TABLE]

where $W(k|r)$ is the Fourier transform of the smoothing window function. We take this to be a spherical top-hat in $k$ -space, $W(k|r)\equiv\Theta(1-k\,r/2\pi)$ , where $\Theta(x)$ denotes the Heaviside step function. The phase at $\bm{\mathrm{k}}=0$ is defined so that $\epsilon(\bm{\mathrm{0}})\equiv 0$ . Following Obreschkow et al. (2013) the LCF is defined by

[TABLE]

where the factor $V^{3}/(2\pi)^{9}$ represents a volume regularization. After taking Fourier transforms we require the three-point function of the $\epsilon_{r}(\bm{\mathrm{k}})$ in order to evaluate this integral. Wolstenhulme et al. (2015) and Eggemeier & Smith (2017) demonstrated that, at lowest order in the expansion of the probability density function for Fourier phases, this three-point function is directly related to the Fourier bispectrum. Therefore the LCF must contain some fraction of the information in $B$ , but because $\ell(r)$ is an average over specific collinear configurations it represents a compression. Specifically, the number of LCF bins will vary linearly with changes in the effective cut-off on Fourier modes.

2.3 Modal bispectrum

Our final proxy is a ‘modal’ expansion of the three-point function. This is very similar to the Fourier bispectrum, except that we exchange the Fourier basis $\mathrm{e}^{\mathrm{i}\bm{\mathrm{x}}\cdot\bm{\mathrm{k}}}$ for a set of alternative modes that are better adapted to the structure of $B$ . The exchange is helpful if we can represent the bispectrum to the same accuracy using fewer modes than required by the Fourier representation. This approach was originally developed by Fergusson & Shellard (2009) and Regan et al. (2010) to analyse microwave background data, and subsequently applied to large-scale structure by Fergusson, Regan & Shellard (2012) and Regan et al. (2012).

In the alternative basis we represent the Fourier bispectrum in the form

[TABLE]

where the $Q_{n}$ are basis functions that span the space of configurations compatible with a triangle condition on $(k_{1},k_{2},k_{3})$ , but can otherwise be chosen freely provided they are linearly independent. The $\beta_{n}^{Q}$ are numbers that we describe as ‘modal coefficients’. They can be regarded as averages of the Fourier bispectrum over a set of configurations picked out by the corresponding $Q_{n}$ . The function $w(k_{1},k_{2},k_{3})$ is an arbitrary weight that will be chosen in Section 3.3.

If the $Q_{n}$ form a complete basis we expect $B$ and $B_{\text{modal}}$ to become equivalent in the limit $n_{\text{max}}\rightarrow\infty$ . In this limit the modal expansion is merely a reorganization of the Fourier representation. But if we select the lowest $Q_{n}$ to average over the most relevant Fourier configurations then it may be possible to represent a typical $B$ using only a small number of modes.222Here, ‘most relevant’ is defined by the features of the bispectrum for which we wish to search. For example, inspection of the formulae appearing in Sections 3.1–3.2 below shows that both the integrated bispectrum and line correlation function can be regarded as instances of (7), with $Q_{n}$ adjusted to prioritize specific groups of Fourier configurations. For these cases, however, the resulting $Q$ -basis is not complete. In this paper we distinguish the modal decomposition, for which the $Q$ -basis is intended to be complete, from proxies such as $ib$ and $\ell$ which are intended to be projections. Taking $n_{\text{max}}$ to be of order this number, the outcome yields useful compression whenever $n_{\text{max}}\ll N_{\text{triangles}}$ , where $N_{\text{triangles}}$ is the number of Fourier configurations contained in the volume under discussion. At least for reasonably smooth bispectra, Schmittfull, Regan & Shellard (2013) found that this could be done with no more than modest loss of signal.

**Orthonormal basis.—**Given a choice of $Q_{n}$ we may redefine the basis by taking arbitrary linear combinations. For example, we will use this freedom in Section 3.3 to obtain a basis for which the $\beta$ -coefficients are uncorrelated. The covariance matrix in this redefined basis is especially simple.

Such a redefinition can be performed using an invertible matrix $\lambda_{mn}$ . We define $R_{n}\equiv\sum_{m}\lambda_{nm}^{-1}Q_{m}$ . The $\beta$ -coefficients in the $R$ -basis now satisfy $\beta_{n}^{R}\equiv\sum_{m}\lambda_{mn}\beta_{m}^{Q}$ . Since the $Q$ - and $R$ -bases are reorganizations of each other, the modal bispectrum defined using either basis is equivalent,

[TABLE]

3 Predicting typical values and covariances for the proxies

In this section we explain how to obtain predictions for the typical values and covariances of $ib(k)$ , $\ell(r)$ and $\beta^{R}_{m}$ in a given cosmological model. This can be done with different degrees of sophistication, corresponding—for example—to truncations at different levels in the loop expansion of standard perturbation theory (Bernardeau et al., 2002), or by using fitting functions calibrated to match the output of N-body simulations (Mead et al., 2015). Since each proxy aggregates a different group of Fourier configurations, and these configurations vary in their response to features of the background cosmology, the sophistication needed to adequately capture the behaviour of the proxies may vary.

This is both a challenge and an opportunity. Proxies that require delicate modelling to obtain accurate predictions are harder to use, and may be expensive to deploy in a parameter-estimation Monte Carlo. In favourable cases, however, the payoff will be sensitive discrimination between nearby cosmological models. On the other hand, proxies that can be modelled robustly using simple methods are easy to use and cheap to deploy, but may offer correspondingly coarse discrimination. We study these trade-offs by contrasting predictions made using tree-level and one-loop SPT, and the halo model. For the halo-model power spectrum we choose the HMcode implementation (Mead et al., 2015). For the halo-model bispectrum we use the standard formulae given by Cooray & Sheth (2002) with a Sheth–Tormen mass function (Sheth & Tormen, 1999) and Navarro–Frenk–White halo profile (Navarro et al., 1996). In Section 5 we study the performance of each method compared to numerical estimates extracted directly from N-body simulations, which enables us to characterize the minimum adequate sophistication for each proxy. For simplicity our analysis is framed in terms of the underlying dark matter density field, although in Section 3.4 we explain how this can be extended to predict galaxy clustering.

**Covariance.—**To compute a likelihood for a given proxy, either for the purposes of parameter estimation or to make forecasts, we require an estimate for the covariance between different configurations. Therefore the minimum sophistication needed to adequately predict this covariance matrix will play an additional role in determining the relative expense of each proxy. In practice the covariance matrix is typically estimated by taking measurements from a large suite of N-body simulations or 2LPT catalogues, or, if this is cannot be done, by falling back to a Gaussian approximation. N-body simulations give accurate results, but are expensive enough that assembling sufficient independent realizations to determine the inverse covariance is often not feasible. In comparison, catalogues based on 2LPT are significantly cheaper but become inaccurate in the non-linear regime, while the Gaussian prediction breaks down even earlier and may miss cross-correlations that significantly affect the outcome.

The relative importance of these cross-correlations varies between proxies. In Sections 6–7 we estimate their significance by comparing results from N-body and Gaussian covariances. We describe our procedure to estimate covariance matrices from the simulations in Section 5, but collect formulae for the Gaussian approximation here.

For comparison, the Gaussian covariance for the power spectrum and Fourier bispectrum, measured on a grid of spacing $\Delta k$ with fundamental frequency $k_{\mathrm{f}}=2\pi/V^{1/3}$ , can be written

[TABLE]

where $\bm{1}_{ij}$ is the Kronecker symbol, and

[TABLE]

The Kronecker symbol $\bm{1}_{\bm{\mathrm{k}},\bm{\mathrm{q}}}$ should be interpreted to equal unity if the triangles defined by $\{\bm{\mathrm{k}}_{1},\bm{\mathrm{k}}_{2},\bm{\mathrm{k}}_{3}\}$ and $\{\bm{\mathrm{q}}_{1},\bm{\mathrm{q}}_{2},\bm{\mathrm{q}}_{3}\}$ are equal, and zero otherwise. The degeneracy factor $\mathsf{N}$ equals unity for a scalene triangle, two for an isosceles triangle and six for an equilateral triangle.

3.1 Integrated bispectrum

To evaluate the expression (4) we first establish its relation to the underlying 3-point function. The overdensity within the subvolume labelled by $\bm{\mathrm{r}}_{L}$ can be written

[TABLE]

where $W_{L}(\bm{\mathrm{q}})=V_{s}\prod_{i=1}^{3}\operatorname{\mathrm{sinc}}(q_{i}L_{s}/2)$ is the Fourier transform of the cubic window function with side length $L_{s}$ , and $\operatorname{\mathrm{sinc}}x\equiv(\sin x)/x$ . The power spectrum in this subvolume is $P(\bm{\mathrm{k}},\bm{\mathrm{r}}_{L})\equiv\langle|\delta(\bm{\mathrm{k}},\bm{\mathrm{r}}_{L})|^{2}\rangle/V_{s}$ and the mean overdensity is $\bar{\delta}(\bm{\mathrm{r}}_{L})\equiv\delta(\bm{\mathrm{0}},\bm{\mathrm{r}}_{L})/V_{s}$ . Combining these with equation (2) yields (Chiang et al., 2014)

[TABLE]

Because $\operatorname{\mathrm{sinc}}x$ is strongly peaked for $|x|\lesssim\pi$ the window functions $W_{L}$ effectively constrain the $q_{i}$ integrals to $q_{i}\lesssim 1/L_{s}$ . Since $k\gtrsim 1/L$ within each subvolume, the integral receives significant contributions only from squeezed configurations of the Fourier bispectrum that are of order the subvolume size or larger, because in the limit $q_{1},q_{2}\ll k$ we have $B^{\text{theory}}(\bm{\mathrm{k}}-\bm{\mathrm{q}}_{1},-\bm{\mathrm{k}}+\bm{\mathrm{q}}_{1}+\bm{\mathrm{q}}_{2},-\bm{\mathrm{q}}_{2})\approx B^{\text{theory}}(\bm{\mathrm{k}},-\bm{\mathrm{k}},-\bm{\mathrm{q}}_{2})$ .

Chiang et al. (2014) computed the linear response function using (12) and tree-level SPT, and verified that it reproduces equation (4) to within $2\%$ for $k\gtrsim 0.2\,h^{-1}\,\text{Mpc}$ . For our purposes we require accurate estimates at smaller $k$ , and therefore we perform a numerical integration using (12) directly. The integral is 8-dimensional and its evaluation is challenging; we implement it using the Vegas algorithm provided by the CUBA package (Hahn, 2016). To make the integration time feasible we densely sample $B^{\text{theory}}$ on a 3-dimensional cubic mesh in coordinates $(k_{1},k_{2},\mu_{12})$ , where $\mu_{12}\equiv(k_{1}^{2}+k_{2}^{2}-k_{3}^{2})/(2k_{1}k_{2})$ is the cosine of the angle between $\bm{\mathrm{k}}_{1}$ and $\bm{\mathrm{k}}_{2}$ and can be used in place of the third wavenumber $k_{3}$ . We construct a 3-dimensional cubic spline that interpolates between lattice points and use this spline to evaluate the integrand. To validate this procedure we have verified that our numerical results match the analytic prediction from the linear response function at large $k$ .

Although we have not written subvolume labels explicitly, $\sigma_{L}^{2}$ and all power spectra in (4) refer to subsampled quantities, and therefore should be computed by appropriate convolution with the subvolume window function $W_{L}(\bm{\mathrm{q}})$ .

**Halo model.—**This procedure yields good results for tree-level and one-loop SPT, but does not perform well when applied to the halo model. In this case we we do not recover equivalence between our evaluation of (12) and the linear response function, which we compute by numerical differentiation of the HMcode power spectrum. We interpret this disagreement as an indication that the standard halo model makes inconsistent predictions for the modulation of the power spectrum with $\bar{\delta}$ , or the squeezed limit of the bispectrum, or both. Moreover, comparison of the halo-model $ib$ computed using (12) to our N-body simulations shows poor agreement, suggesting that estimates based on (12) will be inaccurate. Therefore, for the halo model only, we estimate $ib$ by assuming the linear response approximation (4) and computing $\text{d}\ln P/\text{d}\bar{\delta}$ . We calculate the derivative using the simulation-calibrated formula proposed by Chiang et al. (2014),

[TABLE]

which gives reasonable agreement with our simulations.

**Covariance.—**In the absence of shot noise, the Gaussian covariance for estimates of $ib$ constructed from data can be written

[TABLE]

In this expression $V_{s}$ is the volume of a subsampled region and $V$ denotes the total survey volume. The quantity $N_{ks}=2\pi k^{2}\Delta kV_{s}$ is the number of Fourier modes in a subvolume $k$ -bin.

3.2 Line correlation function

Wolstenhulme et al. (2015) used tree-level SPT to predict the line correlation function. Their result was generalized to an arbitrary bispectrum by Eggemeier & Smith (2017), who gave the formula

[TABLE]

where $j_{0}(x)=\sin(x)/x$ is the spherical Bessel function of order zero and the integrals over $\bm{\mathrm{k}}_{1}$ and $\bm{\mathrm{k}}_{2}$ are cut off at the scale $k_{i}=2\pi/r$ . The quantity $B_{\epsilon}$ is defined by

[TABLE]

and gives the dominant contribution to the bispectrum of the phase field $\epsilon(\bm{\mathrm{k}})=\delta(\bm{\mathrm{k}})/|\delta(\bm{\mathrm{k}})|$ in the limit of large volume $V$ . For smaller volumes there are corrections scaling as powers of $V^{-1/2}$ compared to the dominant term (Eggemeier & Smith, 2017).

**Evaluation.—**To evaluate (15) we must perform a 6-dimensional integral. We use a strategy similar to that described in Section 3.1, by sampling the bispectrum over a cubic lattice and interpolating between lattice sites. The integration is again performed using Vegas.

In the special case of tree-level SPT, Wolstenhulme et al. (2015) showed that (15) could be reduced to a 3-dimensional integral,

[TABLE]

where $P^{\text{tree}}$ is the tree-level power spectrum, and the upper limit of the $\mu$ -integral is chosen to guarantee $|\bm{\mathrm{k}}_{1}+\bm{\mathrm{k}}_{2}|\leqslant 2\pi/r$ . That requires

[TABLE]

Equation (17) is useful because it provides a means to test the accuracy of our 6-dimensional Vegas integrations, and the 3-dimensional interpolations they entail. We have compared estimates for the tree-level line correlation function using both (15) and (17) and find good agreement.

**Covariance.—**To determine the Gaussian covariance we require the two-point function of the phase field,

[TABLE]

It follows that, in the absence of shot noise, the covariance between estimators for the the line correlation function on scales $r_{i}$ and $r_{j}$ can be written (Eggemeier & Smith, 2017)

[TABLE]

where $k_{\mathrm{f}}=2\pi/V^{1/3}$ denotes the fundamental frequency (defined above equation (9)), and $r=\max\{r_{i},\,r_{j}\}$ . Note that (20) is not diagonal; the integral that defines the line correlation function depends on a range of Fourier modes for any scale $r_{i}$ , and any Fourier modes that are common between $\ell(r_{i})$ and $\ell(r_{j})$ will contribute a nonzero covariance. Moreover, equation (20) shows that the Gaussian covariance is independent of redshift and all cosmological parameters.

3.3 Modal bispectrum

It was explained in Section 2.3 that the modal decomposition is defined by choice of a basis $Q_{n}$ that samples groups of relevant Fourier configurations. The structure and ordering of the $Q_{n}$ determine those configurations we wish to prioritize. But unless we carefully adjust the $Q_{n}$ they will be correlated, and these correlations will be inherited by the $\beta_{n}^{Q}$ . The outcome is that the covariance matrix for estimators of the $\beta_{n}^{Q}$ is rather complex.

**Construction of $R$ -basis.—**To avoid this we redefine the basis, as in equation (8), to simplify the covariance matrix for estimators of the corresponding $\beta_{n}^{R}$ . The construction proceeds in stages. First, consider the expected signal-to-noise with which it is possible to measure a single mode $Q_{n}/w$ from (7). Using a Gaussian approximation for the noise this can be written

[TABLE]

We are free to choose the weight $w$ to simplify this integral. We define

[TABLE]

after which the computation of the expected signal-to-noise reduces to

[TABLE]

To write this and similar expressions economically we have introduced the notation

[TABLE]

for any $f$ and $g$ . In the special case that these depend only on the wavenumbers $k_{i}$ and not their orientations $\hat{\bm{\mathrm{k}}}_{i}$ some of the angular integrations are trivial and we obtain the simpler expression

[TABLE]

Here, $\mathcal{V}$ represents the set of points $(k_{1},k_{2},k_{3})$ where lines of length $k_{1}$ , $k_{2}$ and $k_{3}$ can be arranged to form a triangle, ie. $2\max\{k_{i}\}\leqslant\sum_{i}k_{i}$ ; for details, see Fergusson et al. (2010). In principle the integral can be carried over all $k_{i}$ , but in practice it will be cut off at upper and lower limits $k_{\text{max}}$ and $k_{\text{min}}$ . The expressions (24) and (25) can be regarded as an inner product on the $Q_{n}$ that weights each contributing Fourier configuration according to its individual signal-to-noise.

Second, the $R$ -basis is chosen to be diagonal with respect to this inner product. As we will see below, because the resulting $R_{n}$ modes are orthogonal when weighted by signal-to-noise, the covariance matrix for estimators of the coefficients $\beta_{n}^{R}$ becomes diagonal under the same approximation of Gaussian noise used to determine the weighting in (21). Specifically, we define

[TABLE]

It is sometimes preferable to express results in terms of $\bar{\gamma}_{mn}$ , which is independent of $k_{\text{min}}$ and $k_{\text{max}}$ . For any suitable $Q$ -basis both $\gamma_{mn}$ and $\bar{\gamma}_{mn}$ will be symmetric and positive-definite and may be factored into the product of a matrix and its transpose. Therefore there exists a matrix $\lambda_{mn}$ such that $\bar{\gamma}_{mn}=\sum_{r}\lambda_{mr}\lambda_{nr}$ . Application of (8) with $\lambda_{mn}$ as the transformation matrix yields $R_{n}=\sum_{n^{\prime}}\lambda_{nn^{\prime}}^{-1}Q_{n^{\prime}}$ , and these modes are orthogonal in the sense

[TABLE]

**Determination of modal coefficients.—**Whether we work with the $Q$ - or $R$ -basis, we must predict the corresponding $\beta$ -coefficients for each model of interest. In practice the extra matrix operations needed to obtain the $R$ -basis mean that it is simplest to perform calculations in the $Q$ -basis, before translating to the $R$ -basis to interpret the results. We adopt this procedure whenever concrete calculations using the modal decomposition are required. We use the $Q$ -basis constructed by Fergusson et al. (2010). (The details are summarized in Appendix A.1.) It is not intended to prioritize any single class of Fourier configurations, but rather attempts to provide a good description of reasonably smooth bispectra over a range of shapes and scales.

To extract the $\beta_{n}^{Q}$ we use (24). Assuming (7) can be interpreted as an equality, we conclude that for an arbitrary bispectrum $B^{\text{theory}}(k_{1},k_{2},k_{3})$

[TABLE]

Finally, the individual $\beta_{n}^{Q}$ should be extracted by contraction with the inverse matrix $\gamma_{mn}^{-1}$ . If the bispectrum has no angular dependence then the inner product can be computed using the simplified expression (25), which yields

[TABLE]

where we have used the quantity $B_{\epsilon}$ defined in (16). The $\beta_{n}^{R,\text{theory}}$ may be obtained by the transformation $\beta_{n}^{R}=\sum_{m}\lambda_{mn}\beta^{Q}_{m}$ . The appearance of the phase bispectrum in (29) is a consequence of our choice of weight $w$ .

Equation (28) would continue to apply were we to change the definition of the ‘inner product’ $\langle\kern-2.4pt\langle\cdot|\cdot\rangle\kern-2.4pt\rangle$ , and an analogue of (29) would continue to give the individual $\beta_{n}^{Q,\text{theory}}$ . Our choice of signal-to-noise weighting in $\langle\kern-2.4pt\langle\cdot|\cdot\rangle\kern-2.4pt\rangle$ is important only for construction of the $R$ -modes and the covariance inherited by the $\beta_{n}^{R,\text{theory}}$ .

**Numerical evaluation.—**In practice, equation (29) requires evaluation of a 3-dimensional integral over the region $\mathcal{V}$ . To implement it we compute $wB$ on a $200^{3}$ cubic lattice in $(k_{1},k_{2},k_{3})$ and estimate the integral by volume-weighted cubature over this lattice. Some work is required to account for irregular boundary orientations; we give these details in Appendix A.2.

**Covariance.—**Finally we compute the covariance of estimators for the $\beta_{n}^{R}$ coefficients under the assumption of Gaussian covariance for the bispectrum estimator $\delta(\bm{\mathrm{k}}_{1})\delta(\bm{\mathrm{k}}_{2})\delta(\bm{\mathrm{k}}_{3})V^{-1}\bm{1}_{\bm{\mathrm{k}}_{1}+\bm{\mathrm{k}}_{2}+\bm{\mathrm{k}}_{3},\bm{\mathrm{0}}}$ . Using equation (24), and (28) with $R$ exchanged for $Q$ , we obtain

[TABLE]

The weighting for each Fourier configuration matches the signal-to-noise, making this correlator diagonal as a consequence of our construction of the $R$ -basis. Therefore we conclude

[TABLE]

As for the line correlation function, it is independent of redshift and cosmological parameters. If we were to abandon the approximation of Gaussian covariance then (30) would no longer be proportional to exactly $\langle\kern-2.4pt\langle R_{m}|R_{n}\rangle\kern-2.4pt\rangle$ . In this case the amplitude of the diagonal elements would be modified, and non-diagonal components would appear.

3.4 Galaxy bias

The discussion in Sections 3.1–3.3 was framed in terms of the dark matter overdensity $\delta$ , but this is not what is measured by surveys of large-scale structure. Instead, they record the abundance of galaxies or some other population of tracers whose density responds to the dark matter density but need not match it.

On large scales the relation between the galaxy ( $\delta_{g}$ ) and dark matter ( $\delta$ ) density fields is well-described by the linear model $\delta_{g}=b_{1}\delta$ (Kaiser, 1984; Fry & Gaztanaga, 1993). The linear bias parameter $b_{1}$ may be redshift-dependent, and varies between different populations of galaxies. On small scales the overdensities are larger, and both non-linear and non-local corrections become important. To obtain a satisfactory description we must typically include terms at least quadratic (or higher) in $\delta$ (Fry & Gaztanaga, 1993; Smith et al., 2007), together with terms involving the tidal gravitational field (Catelan et al., 2000; McDonald & Roy, 2009; Chan et al., 2012; Baldauf et al., 2012).

In what follows we assume the local Lagrangian bias model, in which the galaxy overdensity at early times is taken to be a local function of the dark matter overdensity. At later times the bias is determined by propagating this relationship along the dark matter flow. McDonald & Roy (2009) demonstrated that this implies the Eulerian galaxy overdensity at the time of observation can be written

[TABLE]

where ‘ $\cdots$ ’ denotes terms of third order and higher that we have not written explicitly. The field $s^{2}(\bm{\mathrm{x}})=s^{ij}(\bm{\mathrm{x}})\,s_{ji}(\bm{\mathrm{x}})$ is a contraction of the tidal tensor, defined by $s_{ij}(\bm{\mathrm{x}})\equiv\left[\partial_{i}\partial_{j}\nabla^{-2}-\frac{1}{3}\bm{1}_{ij}\right]\delta(\bm{\mathrm{x}})$ . Therefore, up to second order in $\delta$ , we require two additional redshift- and population-dependent bias parameters: the quadratic bias $b_{2}$ , as well as the non-local bias $b_{s^{2}}$ . In the local Lagrangian model the non-local bias satisfies $b_{s^{2}}=-4(b_{1}-1)/7$ (Chan et al., 2012; Baldauf et al., 2012), although in more general biasing prescriptions it could be allowed to vary independently.

**Power spectrum.—**After translating to Fourier space it follows that the tree-level galaxy power spectrum can be written

[TABLE]

To obtain a consistent result at one-loop we should include the unwritten third-order contributions in (32), which generate multiplicative renormalizations of the linear power spectrum in the same way as the ‘13’ terms of one-loop SPT. McDonald & Roy (2009) showed that these could be collected into a single new parameter which we denote $b_{3\text{nl}}$ to match Gil-Marin et al. (2015). Therefore

[TABLE]

Saito et al. (2014) showed that in the local Lagrangian model $b_{3\text{nl}}$ satisfies $b_{3\text{nl}}=32(b_{1}-1)/315$ . Explicit expressions for all terms appearing in (34) were given by McDonald & Roy (2009). Note that contributions from the non-linear bias appear only in the one-loop power spectrum.

**Bispectrum.—**In contrast to the power spectrum, the bispectrum receives corrections from non-linear bias terms even at tree-level. Specifically,

[TABLE]

where $S_{2}(\bm{\mathrm{k}}_{1},\bm{\mathrm{k}}_{2})\equiv(\bm{\mathrm{k}}_{1}\cdot\bm{\mathrm{k}}_{2})^{2}/(k_{1}k_{2})^{2}-1/3$ is the kernel appearing in the Fourier transform of the contracted tidal field, $s^{2}(\bm{\mathrm{k}})=(2\pi)^{-3}\int\text{d}^{3}q\,S_{2}(\bm{\mathrm{q}},\bm{\mathrm{k}}-\bm{\mathrm{q}})\delta(\bm{\mathrm{q}})\delta(\bm{\mathrm{k}}-\bm{\mathrm{q}})$ .

To obtain the galaxy bispectrum consistently at one loop one should compute the dark matter overdensity to fourth order in perturbation theory and develop the bias expansion to the same order. This procedure has been adumbrated in the literature (Assassi et al., 2014) but not developed completely. Therefore to obtain an estimate of the one-loop bispectrum we make the approximation

[TABLE]

This is consistent with the prescriptions used by Gil-Marin et al. (2015) and Baldauf et al. (2016).

**Application to bispectrum proxies.—**The outcome of this discussion is that, to predict the integrated bispectrum, line correlation function, or modal bispectrum for the galaxy density field, we should make the replacements $P^{\text{theory}}(k)\rightarrow P^{\text{theory}}_{\text{gal}}(k)$ and $B^{\text{theory}}(k_{1},k_{2},k_{3})\rightarrow B^{\text{theory}}_{\text{gal}}(k_{1},k_{2},k_{3})$ where necessary in equations (12), (15) and (29).

To obtain theory predictions at tree-level we use equations (33) and (35), whereas to obtain perdictions at one-loop we use equations (34) and (36). Finally, to evaluate predictions using the halo model we apply equations (34) and (35), but with $P^{\text{1-loop}}\rightarrow P^{\text{halo}}$ and $B^{\text{tree}}\rightarrow B^{\text{halo}}$ for the dark matter correlations.

4 Estimating bispectrum proxies from N-BODY simulations

In this section we briefly describe our N-body simulations and explain how they are used to estimate the Fourier bispectrum and its proxies $ib$ , $\ell$ and $\beta_{n}^{Q}$ .

4.1 Simulations

Our measurements are based on two sets of simulations: (1) $200$ N-body simulations containing dark matter only, with a fixed choice of fiducial cosmological parameters; (2) a total of $60$ simulations constructed by varying one cosmological parameter at a time, with four realizations per model including the fiducial set. These simulations were performed on the ZBOX supercomputer at the University of Zurich and were described in Smith (2009) and Smith et al. (2014). Each set uses a comoving boxsize of $L=1500\,h^{-1}\,\text{Mpc}$ and contains $N=750^{3}$ particles. Initial conditions for the particles were set at redshift $z=49$ using second-order Lagrangian perturbation theory acting on a realization of a Gaussian random field (Crocce et al., 2006) with transfer functions from CMBFAST (Seljak & Zaldarriaga, 1996). The particles are evolved to $z=0$ under the influence of gravity using the Gadget-2 code (Springel, 2005), modified to allow a time-evolving equation of state for dark energy.

The fiducial cosmological parameters correspond to a flat $\Lambda$ CDM model and are summarized in Table 1. Specifically, $\Omega_{m}$ and $\Omega_{b}$ are the matter and baryon density parameters; $w_{0}$ and $w_{a}$ parametrize the equation of state for dark energy, viz. $w(a)\equiv w_{0}+(1-a)\,w_{a}$ ; $\sigma_{8}$ is the amplitude of density fluctuations smoothed on a scale $8\,h^{-1}\,\text{Mpc}$ ; $n_{s}$ is the spectral index of the primordial power spectrum; and $h$ is the dimensionless Hubble parameter. We collectively write these as a vector $\theta_{\alpha}$ with index $\alpha$ labelling the different parameters. To construct set (2) each parameter is offset by $+\Delta\theta_{\alpha}$ and $-\Delta\theta_{\alpha}$ , with all other parameters held fixed. The stepsizes $\Delta\theta_{\alpha}$ are listed in Table 1. To reduce noise when estimating parameter derivatives, we construct initial conditions for each of the four realizations using the same Gaussian random field as its fiducial partner. Since we vary over seven cosmological parameters this gives a total of $4+2\times 7\times 4=60$ simulations in the suite.

4.2 Density field

To compute the overdensity field in each simulation we use the cloud-in-cell assignment scheme to distribute particles over a regular Cartesian grid. We apply a fast Fourier transform and extract the discrete real-space density field by deconvolving the cloud-in-cell window function. The result is

[TABLE]

The labels ‘disc’ and ‘grid’ label Fourier-space fields in the full volume $V$ and on the cloud-in-cell grid, respectively. The Nyquist frequency $k_{\text{Ny}}=\pi N_{\text{grid}}/L$ is determined by the number of grid cells per dimension. For our numerical results we use $N_{\text{grid}}=512$ .

4.3 Estimating the power spectrum

Given a realization of the $\delta$ -field within a simulation volume $V=L^{3}=(2\pi)^{3}\delta_{\text{D}}(\bm{\mathrm{0}})$ , a simple estimator for the power at wavevector $\bm{\mathrm{k}}_{1}$ can be written $\hat{\mathcal{P}}(\bm{\mathrm{k}}_{1},\bm{\mathrm{k}}_{2})=\delta(\bm{\mathrm{k}}_{1})\delta(\bm{\mathrm{k}}_{2})\bm{1}_{\bm{\mathrm{k}}_{1},-\bm{\mathrm{k}}_{2}}/V$ .333In the remainder of this paper we assume it is understood that we are dealing with the discrete density field whenever we refer to measured quantities, and drop the label ‘disc’. Unfortunately this procedure is very noisy. An improved estimate can be obtained by summing over a set of modes satisfying the closure criterion $\sum_{i}\bm{\mathrm{k}}_{i}=\bm{\mathrm{0}}$ within a thin $\bm{\mathrm{k}}$ -shell. Since we are working in finite volume the available modes are discretized in units of the fundamental frequency $k_{\mathrm{f}}=2\pi/L$ , and therefore the thin-shell average should be written

[TABLE]

where $\Delta k\geqslant k_{\mathrm{f}}$ represents a bin width, and we have introduced the binning function $\tilde{\Pi}_{k}(\bm{\mathrm{q}})$ which is defined to be unity if $|\bm{\mathrm{q}}|\in[k-\Delta k/2,k+\Delta k/2]$ and zero otherwise. Finally, the quantity $V_{P}$ represents the volume of the spherical shell accounting for discretization,

[TABLE]

4.4 Estimating the bispectrum

In analogy with the power spectrum, an estimator for a single configuration of the Fourier bispectrum can be written $\hat{\mathcal{B}}(\bm{\mathrm{k}}_{1},\bm{\mathrm{k}}_{2},\bm{\mathrm{k}}_{3})=\delta(\bm{\mathrm{k}}_{1})\delta(\bm{\mathrm{k}}_{2})\delta(\bm{\mathrm{k}}_{3})\bm{1}_{\bm{\mathrm{k}}_{1}+\bm{\mathrm{k}}_{2}+\bm{\mathrm{k}}_{3},\bm{\mathrm{0}}}/V$ . [This expression was already used in Section 3.3 to obtain the Gaussian covariance for estimators of the $\beta_{n}^{R}$ .] To obtain an acceptable signal-to-noise we should again average over a set of configurations whose wavenumbers lie within suitable discretized $\bm{\mathrm{k}}$ -shells. After doing so we obtain the estimator

[TABLE]

where the normalization $V_{B}$ should now be evaluated using (Sefusatti et al., 2006; Joachimi et al., 2009)

[TABLE]

Dividing by the square of the fundamental cell volume shows that the number of configurations scales as $N_{\text{triangles}}(k_{1},k_{2},k_{3})=V_{B}(k_{1},k_{2},k_{3})/k_{\mathrm{f}}^{6}\propto N_{1}N_{2}N_{3}$ , where $N_{i}\equiv k_{i}/k_{\mathrm{f}}$ is the length of the side $k_{i}$ in units of the fundamental mode. Hence, if we scale the configuration by $k_{i}\rightarrow\lambda k_{i}$ then the number of available configurations scales as $\lambda^{3}$ .

Sefusatti (2005), Fergusson, Regan & Shellard (2012) and Scoccimarro (2015) observed that (40) could be implemented efficiently by rewriting the Dirac $\delta$ -function using its Fourier representation, $(2\pi)^{3}\delta_{\text{D}}(\bm{\mathrm{q}})=\int\text{d}^{3}x\,\mathrm{e}^{\mathrm{i}\bm{\mathrm{q}}\cdot\bm{\mathrm{x}}}$ , and factorizing the dependence on the $\bm{\mathrm{q}}_{i}$ . This yields

[TABLE]

Similarly,

[TABLE]

where $\Pi_{k}(\bm{\mathrm{x}})$ is the inverse Fourier transform of $\tilde{\Pi}_{k}(\bm{\mathrm{q}})$ .

Equation (42) is numerically more efficient than a direct implementation of (40), because it requires only three Fourier transforms to compute $\mathcal{D}_{k}$ for each wavenumber in the triplet $\{k_{1},k_{2},k_{3}\}$ . Moreover, once each $\mathcal{D}_{k}$ has been obtained it can be re-used for any configuration that shares the same wavenumber. In spite of this improvement, however, it remains a formidable computational challenge to estimate all bispectrum configurations contained within a large volume $V$ . Different strategies have been employed to make the calculation feasible. One option is to coarsely bin configurations with binning width equal to several times the fundamental mode. This drastically reduces the number of configurations to be measured. An alternative is to search only among a limited subset of configurations. This may be helpful if we wish to search for specific physical effects, but risks overlooking important signals if we are searching blindly. In either case the analysis is unlikely to be optimal because information is lost.

4.5 Estimating the integrated bispectrum

Our procedure to estimate the integrated bispectrum is based directly on its definition. We separate the total volume into $N_{s}$ subvolumes, enumerated by the labels $i=1,\ldots,N_{s}$ . We compute the mean overdensity $\hat{\bar{\delta_{i}}}$ and power spectrum $\hat{P}(k)_{i}$ within each subvolume. Finally, we average the product $\hat{P}(k)_{i}\hat{\bar{\delta}}_{i}$ over all subvolumes. Therefore,

[TABLE]

The normalized integrated bispectrum can be obtained by rescaling,

[TABLE]

where here $\hat{P}(k)=\sum_{i=1}^{N_{s}}\hat{P}(k)_{i}/N_{s}$ is the average subvolume power spectrum and $\hat{\sigma}_{L}^{2}=\sum_{i=1}^{N_{s}}\hat{\bar{\delta}}_{i}^{2}/N_{s}$ is the average variance of the mean overdensity.

4.6 Estimating the line correlation function

A procedure to estimate the line correlation function was outlined by Eggemeier & Smith (2017). We evaluate

[TABLE]

where $\overline{j_{0}}(|\bm{\mathrm{k}}|r)$ denotes an average of $j_{0}(kr)$ taken over the volume of a fundamental $k$ -space cell centred at $\bm{\mathrm{k}}$ . The sum scales as $\sim(2L/r)^{6}$ , making its evaluation fast on large scales but challenging on small ones, where the sum includes the majority of Fourier modes. On scales below $\sim 105\,h^{-1}\,\text{Mpc}$ we find that the real space estimator described by Eggemeier & Smith (2017) becomes more efficient and therefore we use it within that regime. For scales accessible to both schemes we verified that both estimators yield the same result.

4.7 Estimating the modal bispectrum

Equation (29) shows that an estimate of the modal coefficient $\beta^{Q}_{m}$ requires evaluation of $\langle\kern-2.4pt\langle w\hat{\mathcal{B}}|Q_{n}\rangle\kern-2.4pt\rangle$ , where $\hat{\mathcal{B}}$ is the bispectrum estimator defined in Section 4.4. Using equation (24), writing the $\delta$ -function using its Fourier representation, and factorizing the integral as described in Section 4.4, we find

[TABLE]

Here, $q_{n}(k)$ is a polynomial used in the construction of the modes $Q_{n}$ ; see Appendix A.1. Equation (47) shows that the computation can be reduced to a single 3-dimensional integral over the $\mathcal{M}_{n}(\bm{\mathrm{x}})$ , which are themselves weighted Fourier transforms of $\delta$ . Finally, $\beta_{m}^{Q}$ can be estimated by contracting with the inverse inner product matrix $\gamma_{mn}^{-1}$ defined in (26),

[TABLE]

To obtain the corresponding $R$ -basis coefficients requires a further linear transformation

[TABLE]

where $\lambda_{mn}$ is the matrix defined above (27). As explained in Section 3.3, we generally perform numerical calculations in the $Q$ -basis in order to preserve the simplicity of (47), but present results in the $R$ -basis because their covariance properties make these coefficients simpler to interpret. In either basis, the measured coefficients can be used to reconstruct the bispectrum for any required Fourier configuration using equation (8).

Note that, because the matrix $\gamma_{nm}$ can be tabulated, measuring a single modal coefficient has the same computational complexity as measuring a single configuration of the Fourier bispectrum.

4.8 Choice of bins

In Table 2 we summarize the parameters used in implementing estimators for each of these statistical quantities. The power spectrum and Fourier bispectrum are binned by averaging over shells of width $\Delta k$ as explained in Sections 4.3–4.4. For the same reasons we also average the subvolume power spectra used to construct the integrated bispectrum. The line correlation function and modal coefficients do not involve averaging over shells, but instead are evaluated using equations (46) and (47) which are themselves aggregates over groups of configurations. For each statistic we report the minimum and maximum $k$ -modes that contribute, and the total number of measurements or bins. Note that the bispectrum bin width corresponds to $\Delta k=8\,k_{\mathrm{f}}$ .

In what follows we will label the Fourier configurations for the bispectrum using the scheme of Gil-Marin et al. (2016). We assign the label (or ‘index’) zero to the equilateral configuration with $k_{1}=k_{2}=k_{3}=k_{\text{min}}$ . The remaining configurations are ordered so that $k_{1}\leqslant k_{2}\leqslant k_{3}$ and $k_{3}\leqslant k_{1}+k_{2}$ . Their labels are assigned by sequentially increasing $k_{3}$ , $k_{2}$ and $k_{1}$ (in this order) and incrementing the index for each valid triangle.

In our measurements of the integrated bispectrum we split the simulation box into $125$ subcubes, corresponding to a side of $300\,h^{-1}\,\text{Mpc}$ . This increases $k_{\text{min}}$ by a factor of five compared to the full box. Finally, for the line correlation function we use a non-regular $r$ -spacing, spanning the range from $10$ to $200\,h^{-1}\,\text{Mpc}$ . The first seven bins are separated by $2.5\,h^{-1}\,\text{Mpc}$ , which doubles to $5\,h^{-1}\,\text{Mpc}$ for the next eleven and to $10\,h^{-1}\,\text{Mpc}$ for the remaining twelve bins.

5 Comparison of theoretical predictions and simulations

In this section we present estimates of the typical values for each bispectrum proxy introduced in Section 2, and implemented using the formulae of Section 4. We derive these from the 200 simulations of our fiducial cosmology in set (1)—see Section 4.1—at redshifts $z=0$ , $z=0.52$ and $z=1$ . Also, using the simulation set (2) we determine how each proxy responds to changes in the cosmological parameters (Section 5.2). These measurements enable us to characterize the accuracy of the theoretical predictions for these typical values discussed in Section 3. Finally, in Section 5.3 we discuss measurements of the covariances and cross-covariances for each pair of proxies.

5.1 Mean values in the fiducial cosmology

5.1.1 Comparison of measurements and theoretical predictions

In Figs. 1–4 we show measurements of each proxy for all three redshifts, averaged over the $200$ different realizations. We do not explicitly display our power spectrum measurements, which have been well-studied by previous authors (e.g. Makino et al., 1992; Lokas et al., 1996; Scoccimarro & Frieman, 1996; Scoccimarro et al., 1998; Scoccimarro et al., 2001; Smith et al., 2003; Seljak, 2000; Peacock & Smith, 2000; Scoccimarro & Sheth, 2002; Mead et al., 2015). In each figure, the top row contrasts our N-body measurements with the tree-level, one-loop and halo model predictions. The middle row displays the one-loop and halo model predictions relative to the tree-level prediction, and the bottom row shows the difference between the N-body measurements and the theoretical prediction in units of the standard deviation of the N-body estimate.

**Fourier bispectrum.—**We find that both of the SPT predictions are more accurate at large scales and high redshifts. The halo model prediction is a better match at low redshift. The differences between each theoretical estimate and the typical values measured from simulation are broadly consistent with previous analyses; see Scoccimarro et al. (1998); Scoccimarro et al. (2001); Schmittfull et al. (2013); Lazanu et al. (2016).

**Modal bispectrum.—**In Fig. 2 we plot the Fourier bispectrum reconstructed from (7) using our measurements of the $\beta_{n}^{Q}$ coefficients. This is easier to interpret than the $\beta$ -values themselves. The scatter between predicted and measured values (most clearly visible in the bottom row) is similar to the scatter for the directly-measured Fourier bispectrum (Fig. 1), and indicates that differences between the reconstructed and directly-measured values are small. We give a more detailed analysis of the accuracy of the modal bispectrum in Section 5.1.2.

**Integrated bispectrum.—**We give values for the normalized integrated bispectrum in Fig. 3. Except for a few $k$ -bins the error bars are too large to show any preference for a particular theoretical model. In contrast to Figs. 1–2, the bottom row shows that tree-level SPT is a good match to the measured $ib$ at all three redshifts. Conversely, the halo model prediction is a better match at high redshift. Our theoretical predictions are consistent with those reported by Chiang et al. (2014), but our measured values have larger error bars because we work with a smaller simulation volume.

**Line correlation function.—**Finally, we present our measurements of the line correlation function in Fig. 4. The one-loop and halo-model predictions appearing here are new, and have not previously been studied. The most striking feature is the discrepancy between the halo model and SPT-based predictions in the smallest $r$ -bins. This is consistent with the analyses of Wolstenhulme et al. (2015) and Eggemeier & Smith (2017), which both found differences between the tree-level prediction and values measured from simulation on scales with $r\lesssim 30\,h^{-1}\,\text{Mpc}$ . The agreement is good for larger $r$ .

**Theory error.—**The bottom panels of Figs. 1–4 show that our theoretical predictions are accurate within a restricted range of scales. Outside this range it becomes progressively more difficult to model the observables. This mis-modelling should be regarded as an additional source of systematic error—a theory error—when forecasting constraints, or analysing data, using any of these theoretical models. In particle phenomenology such theory errors are routinely estimated when performing fits to data, but their use in cosmology is less common. In this paper we construct Fisher forecasts for parameter error bars using both SPT-based models and the halo model. Comparison of these error bars enables us to estimate the impact of theoretical uncertainties on future constraints that incorporate three-point statistics (see Section 7.4).

An alternative prescription for estimating theory errors was used by Baldauf et al. (2016) and Welling et al. (2016). In their approach the theoretical uncertainty in one-loop SPT is estimated from the next-order term in the loop expansion. We find that this prescription gives noticeably larger estimates than the difference between one-loop SPT and the values we measure from simulations. Therefore, although Baldauf et al. (2012) and Welling et al. (2016) concluded that (for example) constraints on some types of primordial non-Gaussianity would be weakened significantly after accounting for theory errors, our numerical comparison suggests that the attainable error may degrade by less than their analysis would suggest.

5.1.2 Accuracy of modal reconstruction

Comparison of Figs. 1 and 2 demonstrates that the Fourier bispectrum reconstructed from our measurements of the $\beta^{Q}_{n}$ accurately reproduces the correct amplitude and shape dependence. This information is embedded in the modal coefficients. For example, the zeroth basis mode $R_{0}\propto Q_{0}$ is a constant and therefore $\beta^{R}_{0}\propto\beta^{Q}_{0}$ captures information about the mean amplitude of the Fourier bispectrum over all configurations—or, equivalently, the skewness of $\delta$ . The next few modes are slowly varying functions of configuration. Taken together, these low-order modes carry the principal amplitude information and for reasonably smooth bispectra we expect they exhibit the strongest dependence on background cosmological parameters. The higher modes capture more subtle detail. As with any basis decomposition, their inclusion increases the accuracy of the reconstruction.

To see this in detail, consider a reconstruction using only $n_{\text{max}}=10$ modes. In Fig. 5 we plot the Fourier bispectrum reconstructed in this way (blue line) compared to the reconstruction using $n_{\text{max}}=50$ described above (red line). Black crosses mark the measured data points. In the lower panel we plot the ratio between these measured values and the reconstructions. The accuracy is good whether we use $n_{\text{max}}=10$ or $n_{\text{max}}=50$ , but the scatter is smaller for $n_{\text{max}}=50$ . We conclude that, in this case, the first 10 modes are sufficient to capture the main behaviour of the Fourier bispectrum, but extra modes are helpful if we wish to reproduce the precise configuration dependence to within $\lesssim 10\%$ accuracy.

5.2 Derivatives with respect to cosmological parameters

In the remainder of this paper our aim is to obtain Fisher forecasts of error bars for a parameter set $\theta_{\alpha}$ , where the index $\alpha$ labels one of the cosmological parameters of Table 1. For this purpose the role of a theoretical model is to predict the derivatives of observables with respect to each parameter, and the accuracy of the forecast depends on the reliability of these predictions. In this section we study how well our three theoretical models reproduce the derivatives estimated from our simulation suite. We compute the derivative of some estimator $\hat{X}$ at wavenumber $k$ with respect to a parameter $\theta_{\alpha}$ by the rule

[TABLE]

where $\hat{\bar{X}}(k|\bm{\theta})$ is the average over the $200$ fiducial simulations of set (1) (described in Section 4.1) for $X\in\{P,B,\beta,ib,\ell\}$ , and the logarithmic derivative with respect to $\theta_{\alpha}$ is computed using

[TABLE]

The sum is over the four realizations used in simulation set (2), and the derivative is constructed using the $+\Delta\theta_{\alpha}$ and $-\Delta\theta_{\alpha}$ offset simulations described in Section 4.1. The advantage of the logarithmic derivative is that both realizations in the numerator on the right-hand side of (51) share initial conditions with their fiducial partner in the denominator. Therefore, division by the fiducial estimate $\hat{X}^{(i)}(k\mid\bm{\theta})$ minimizes dependence on the specific realization.444This strategy is less successful for the line correlation function. In this case the fiducial value could be very close to zero on some scales. In turn, this produces large errors in the logarithmic derivative. Therefore, for the line correlation function, we estimate the linear derivative $\text{d}\ell/\text{d}\theta_{\alpha}$ instead.

In Fig. 6 we plot the derivatives of each observable with respect to the cosmological parameters at $z=0.52$ . Our forecasts use three redshift bins, but their behaviour is similar to the $z=0.52$ bin and the statements made below can be taken to apply at all three redshifts. We do not include the power spectrum, for which the derivatives appeared in Smith et al. (2014).

**Modal bispectrum.—**To simplify comparison of the modal bispectrum with the Fourier bispectrum, Fig. 6 plots derivatives of the reconstructed bispectrum rather than derivatives of $\beta_{n}^{Q}$ or $\beta_{n}^{R}$ . Comparison of the first two columns shows that the cosmology-dependence is accurately captured using $n_{\text{max}}=50$ , either for theoretical predictions or the measured values.

There is a significant spread in performance of the theoretical models, with tree-level SPT and the halo model generally offering the poorest match. For the derivatives with respect to $\Omega_{m}$ , $\Omega_{b}$ , $n_{s}$ and $h$ these models give similar predictions. The probable reason is that, in the standard halo model, the halo mass function and halo profile are fixed to the fiducial cosmology. Only the input power spectrum is taken to vary with the cosmological parameters, and since it matches the tree-level SPT prediction its derivatives will be equal. Therefore the halo-model derivatives will differ from those of tree-level SPT only via a (possibly scale-dependent) prefactor. More complex halo models with cosmology-dependent halo parametrizations have been studied (see, eg., Mead et al. (2016) for an application to dark energy models). However, determining which variation of the halo model captures the cosmological parameter dependence of the bispectrum most accurately is outside the scope of this paper. We simply note that, if the halo model is to be used for analysis or forecasting of the Fourier bispectrum, its implementation should be chosen with care because its performance depends on these details.

**Integrated bispectrum.—**The derivatives of the integrated bispectrum are shown in the third column of Fig. 6. The errors bars on the measured values are again too large to show a clear preference for any model—and they are generally so large that the measurement is not significantly different from zero. These results are consistent with those reported by Chiang (2015) for a range of values of $\Omega_{m}$ , $\sigma_{8}$ and $n_{s}$ . We conclude that the integrated bispectrum is rather insensitive to the background cosmology and is therefore a comparatively poor tool to constrain it. While this means we must expect a Fisher forecast to predict weaker error bars for the parameters of Table 1, this insensitivity could be an advantage if the intention is to use the integrated bispectrum as a probe of other physics. For example, in addition to the background cosmology we may wish to use the large-scale structure bispectrum to constrain the possibility of primordial three-point configurations produced by inflation on squeezed configurations. Insensitivity to the background cosmology would reduce the likelihood of degeneracies in these measurements.

**Line correlation function.—**The last column of Fig. 6 shows the derivatives of the line correlation function. As for the typical values discussed above, the values predicted by our theoretical models are significantly discrepant with the measured values in the smallest $r$ bins. Also, the derivative with respect to the dark energy parameter $w_{a}$ is particularly discrepant for the halo model. One possible explanation is the construction of the halo model as described above, with its fixed halo mass function and halo profile. Alternatively, it is possible that the halo model power spectrum and bispectrum that we use are subtly inconsistent in a way that produces inaccuracies in the line correlation function on small scales.

5.3 Non-Gaussian covariance

The analytic, Gaussian covariance of each proxy is most accurate at high redshifts and on large scales, where the matter fluctuations are more nearly Gaussian and therefore more accurately described by the power spectrum alone. At low redshifts and on small scales, however, the Gaussian approximation fails due to non-linear evolution of matter fluctuations. This evolution generates additional contributions to the covariance through higher-order $n$ -point correlations.

The simplest and most robust approach to obtain accurate non-Gaussian covariances has been to analyse large suites of N-body simulations. This method was used by Takahashi et al. (2009), Takahashi et al. (2011), Blot et al. (2016), and Klypin & Prada (2017) to study the non-Gaussian covariance of the power spectrum. Other authors have performed analogous studies for the bispectrum (Sefusatti et al., 2006; Chan & Blot, 2016), the real-space partner of the integrated bispectrum (Chiang et al., 2015), and the line correlation function (Eggemeier & Smith, 2017). In this section, we present our measurements of the non-Gaussian covariance for each proxy, estimated from our suite of simulations. We also discuss the cross-covariance between pairs of proxies.

In Sections 6 and 7 we quantify the impact of these complex non-diagonal covariances on estimates of signal-to-noise and Fisher forecasts.

**Correlation matrices.—**We plot correlation matrices for the measurements $P+B$ , $P+\beta$ , $P+ib$ , and $P+\ell$ in Fig. 7. We show measurements only at $z=0$ where differences between the Gaussian and non-Gaussian covariances are largest.

The correlation coefficient $\bm{\mathsf{r}}_{ij}$ between two data bins $i$ and $j$ is defined to satisfy $\bm{\mathsf{r}}_{ij}\equiv\hat{\bm{\mathsf{C}}}_{ij}/\sqrt{\hat{\bm{\mathsf{C}}}_{ii}\,\hat{\bm{\mathsf{C}}}_{jj}}$ , where $\hat{\bm{\mathsf{C}}}$ is the covariance matrix estimated from the simulation suite,

[TABLE]

and $N_{\text{real}}=200$ is the number of realizations. To measure an auto-covariance the data vector $S$ contains all measurements of a single proxy, $S=(X_{a,1},\ldots,X_{a,n})$ or to measure a cross-covariance it contains all measurements from a pair, $S=(X_{a,1},\ldots,X_{a,n_{1}},X_{b,1},\ldots,X_{b,n_{2}})$ , where $X_{a},X_{b}\in\{P,B,\beta,ib,\ell\}$ . The correlation matrix measures the degree of coupling between different measurements. Its elements take values between $-1$ (where the bins are fully anti-correlated) and $+1$ (where the bins are fully correlated). A value of zero corresponds to independent measurements. For comparison, the Gaussian covariance matrices for $P$ , $B$ , $\beta$ and $ib$ are diagonal, whereas for $\ell$ there are correlations between neighbouring bins with similar $r$ because it is a real-space statistic and therefore includes contributions from many Fourier configurations. In the Gaussian approximation the cross-covariance between $P$ and any bispectrum proxy is zero.

**Fourier bispectrum.—**For $P+B$ (upper-left panel of Fig. 7) the correlation matrix has an approximate block structure due to the ordering of the 95 triangle configurations that we measure. The blocks correspond to groups of adjacent configurations with shared values of $k_{1}$ or $k_{2}$ . While the power spectrum $P(k)$ shows mild correlations between different bins at high $k$ , the bispectrum exhibits much stronger correlations. There are also non-zero cross-correlations between power spectrum and bispectrum bins. The correlation between power spectrum and bispectrum tends to be higher when $P(k)$ and $B(k_{1},k_{2},k_{3})$ have wavenumber bins that overlap. Similarly, the correlation between different bispectrum bins is higher when the configurations share at least one wavenumber. However, even configurations that have no wavenumbers in common can be strongly correlated, with correlation coefficient as large as $\sim 0.8$ , due to non-linear growth.

**Modal bispectrum.—**In the upper-right panel of Fig. 7 we present measurements of the correlation coefficients for $P+\beta^{R}$ . These have not previously been reported. As explained in Section 3.3 these measurements apply to the $R$ -basis, for which the covariance matrix is constructed to be diagonal in the Gaussian approximation. We find that only the first two modes are correlated with the majority of $P(k)$ bins. This is reasonable because the lowest modes probe the most scale-independent features of the phase bispectrum. The remainder show low-to-moderate correlation or anti-correlation due to non-linear effects.

**Integrated bispectrum and line correlation function.—**Correlation measurements for the integrated bispectrum appear in the lower-left panel of Fig. 7. The $ib(k)$ measurements show stronger auto-correlations than $P(k)$ as $k$ increases, while the $P\times ib$ cross-correlation is relatively featureless. This indicates that the two data sets are nearly independent. Similarly, we find that the $P\times\ell$ cross-correlation is nearly featureless except where the smallest $r$ bins and highest $k$ bins show significant correlation. Relative to the Gaussian covariance matrix for $\ell$ , the $r$ bins with $r\lesssim 50\,h^{-1}\,\text{Mpc}$ are more strongly correlated due to non-linear growth.

**Cross-covariances.—**Finally, we have computed the correlation matrices between the bispectrum and its proxies. These enable us to identify which bispectrum configurations contribute most to individual bins of $\beta^{R}$ , $ib$ or $\ell$ . We find that the first two $\beta^{R}$ modes are strongly correlated with the bispectrum over large range of triangles, while the remainder are generally more correlated with triangles on the largest scales (that is, lower triangle index). This structure is similar to the $P+\beta^{R}$ correlation matrix.

We find that $B$ and $ib$ are very weakly correlated, which we attribute to $ib$ being dominated by more strongly squeezed triangles than any we include in the 95 measured configurations of $B$ . Finally, the line correlation function is correlated with a majority of bispectrum configurations when $r\lesssim 40\,h^{-1}\,\text{Mpc}$ . This indicates that the line correlation function is sensitive to many different shapes of Fourier triangle. We do not find particularly strong correlations for $\ell\times ib$ , but $\ell\times\beta^{R}$ shows that the line correlation function at small $r$ is highly correlated with the first two $\beta^{R}$ modes. This is consistent with the observation that both are sensitive to a wide range of Fourier configurations.

6 Cumulative signal-to-noise of the bispectrum proxies

Before discussing the constraining power of each proxy we first compute the available signal-to-noise. This is an intermediate step that characterizes the significance with which measurements of each proxy can be extracted from a data set. Negligible signal-to-noise would normally imply poor prospects for parameter constraints. For example, Chan & Blot (2016) and Kayo et al. (2013) studied the signal-to-noise as a proxy for the information content of the Fourier bispectrum in the context of large-scale structure and weak lensing, respectively.

**Numerical procedure.—**The cumulative signal-to-noise $\mathcal{S}/\mathcal{N}$ up to a maximum wavenumber $k_{\text{max}}$ is defined by

[TABLE]

where $S$ is the vector of typical values for either a single proxy or a combination of proxies, defined below equation (52). In this and subsequent sections we drop the use of a hat to denote an estimated value, and an overbar to denote a mean. The sum in (53) runs over all bins containing wavenumbers that satisfy the condition $k\leqslant k_{\text{max}}$ . For the Fourier bispectrum a bin corresponds to a triplet of wavenumbers $(k_{1},k_{2},k_{3})$ , all of which are required to be smaller than $k_{\text{max}}$ .

We use the non-Gaussian covariance matrix measured from simulations, described in Section 5.3, which we denote by $\bm{\mathsf{C}}_{*}$ . Its inverse $\bm{\mathsf{C}}^{-1}_{*}$ is not an unbiased estimator of $\bm{\mathsf{C}}^{-1}$ . A simple prescription to approximately correct for this bias is to rescale $\bm{\mathsf{C}}^{-1}_{*}$ by an Anderson–Hartlap factor (Anderson, 2003; Hartlap et al., 2006), which yields

[TABLE]

where $N_{\text{real}}$ is the number of realizations used to estimate the covariance matrix and $N_{\text{bin}}$ is its dimensionality.555Although the Anderson–Hartlap prescription is simple to apply, it has been pointed out by Sellentin & Heavens (2016) that this rescaling simply broadens the Gaussian likelihood of the data. These authors argued that the distribution of the data is more accurately modelled by a $t$ -distribution. Care should be taken when computing the numerical inverse $\bm{\mathsf{C}}^{-1}_{*}$ , especially for combinations of measurements with signals of widely disparate magnitude. To avoid issues associated with ill-conditioning we first compute the correlation matrix $\bm{\mathsf{r}}_{*,ij}=\bm{\mathsf{C}}_{*,ij}/\sqrt{\bm{\mathsf{C}}_{*,ii}\,\bm{\mathsf{C}}_{*,jj}}$ , whose entries lie between $-1$ and $+1$ . We determine the inverse $\bm{\mathsf{r}}^{-1}_{ij}$ using a singular value decomposition and check that all singular values are above the noise. Finally, we compute the inverse covariance using

[TABLE]

**Results.—**In Fig. 8 we plot the resulting signal-to-noise measurements for the Fourier bispectrum, integrated bispectrum, line correlation function and the quantity $B_{\epsilon}$ defined in (16) and used in the construction of the line correlation function and the modal bispectrum. (The signal-to-noise from $B_{\epsilon}$ and the reconstructed modal bispectrum give almost identical results.) We estimate $B_{\epsilon}$ using the prescription

[TABLE]

Each panel of Fig. 8 shows the cumulative signal-to-noise of the Fourier bispectrum or a proxy (blue circles), together with the power spectrum (black crosses) and their combination including the cross-covariance matrix (red stars). The first four data points in the $B$ and $B_{\epsilon}$ panels use a bin size $\Delta k=2k_{\mathrm{f}}$ in order to probe the low- $k$ regime. The remainder derive from the measurements presented in Section 5 and use $\Delta k=8k_{\mathrm{f}}$ . Our measurements of the integrated bispectrum and line correlation function carry forward the binning procedure used in Section 5. The step-like structure that occurs for $P+\ell$ is due to a mismatch of scales between the power spectrum and the bins of the line correlation function. In each panel, for comparative purposes, we plot lines of matching colour to show the signal-to-noise computed using a Gaussian approximation to the covariance matrix and tree-level SPT to evaluate any correlation measures it contains.

**Discussion.—**First, we note that the Gaussian approximation overpredicts the signal-to-noise for each proxy $X$ and its combination $P+X$ with the power spectrum. This is consistent with the results reported by Chan & Blot (2016). The overprediction occurs because bins become coupled by non-linear evolution, and therefore do not provide independent information as the Gaussian approximation assumes. The effect can be quite severe: while the power spectrum signal-to-noise at $k_{\text{max}}=0.3\,h\,\text{Mpc}^{-1}$ is overpredicted by a factor of three, the impact on the Fourier bispectrum and its proxies is much larger. In these cases the overprediction ranges from a factor of $\sim 5$ or $8$ for $ib$ and $\ell$ up to more than an order of magnitude for the Fourier bispectrum. At smaller $k_{\text{max}}$ the overprediction is less, becoming significant for $k_{\text{max}}\gtrsim 0.1\,h\,\text{Mpc}^{-1}$ .

The Fourier bispectrum, phase bispectrum, and line correlation function individually contribute $\sim 30\%$ of the signal-to-noise of $P(k)$ at $k_{\text{max}}=0.3\,h\,\text{Mpc}^{-1}$ , while the integrated bispectrum achieves only $5\%$ of the $P(k)$ signal-to-noise. For the Fourier bispectrum, this result is consistent with Chan & Blot (2016).

However, for estimating parameter constraints from the joint combination of $P$ and $B$ , or one of its proxies, the individual signal-to-noise contributed by one of these measurements is less important than whether it contains information that is not already present in the power spectrum. This is determined by the signal-to-noise of the combination $P+X$ compared to $P$ alone. The different proxies show significant variation in the improvement from use of $P+X$ , which we indicate as a percentage in the bottom-right corner of each panel. Although $B$ , $B_{\epsilon}$ and $\ell$ individually carry roughly the same signal-to-noise, the uplift in $P+X$ varies from $\sim 91\%$ to $\sim 11\%$ . Note that the signal-to-noise of $P+B$ receives a large improvement from the cross-covariance, which was ignored in Chan & Blot (2016).

The discrepancy in uplift between $B$ and $B_{\epsilon}$ is striking. If this discrepancy were to carry over to parameter constraints it would imply that the Fourier bispectrum carries significantly more constraining power than $B_{\epsilon}$ , even though both statistics are equivalent in the approximation of Gaussian covariance. If true, this would be very surprising. We return to this question in Section 7.5 after we have obtained forecast parameter uncertainties for $B$ and its proxies, which enable us to precisely quantify the constraining power of each statistic.

7 Parameter uncertainty forecasts

In this section we collect our major results, which are Fisher forecasts of the error bars achievable on the parameter set $\theta_{\alpha}=(\Omega_{m},\Omega_{b},w_{0},w_{a},\sigma_{8},n_{s},h)$ of Table 1, based on a fiducial flat $\Lambda$ CDM cosmology. We perform these forecasts with and without inclusion of the bias parameters $(b_{1},b_{2})$ .

In Section 7.1 we summarize our implementation of the Fisher forecasting method, and in Section 7.2 we present and compare the forecasts from each proxy. By comparing forecasts with and without non-Gaussian covariances, and using different theoretical models to describe the dark matter density, we are able to characterize their influence on the final parameter constraints. These discussions appear in Sections 7.3 and 7.4, respectively. Finally, we return to the discussion of Section 6 and examine to what extent the signal-to-noise provides a reliable metric by which to estimate improvements in parameter constraints (Section 7.5).

7.1 Forecasting method

The Fisher formalism can be used to forecast the precision with which cosmological parameters could be measured in a future survey. Consider a data vector $\bm{\mathrm{x}}$ containing measurements of any combination of statistical quantities. The likelihood function $\mathcal{L}(\bm{\theta}\mid\bm{\mathrm{x}})$ is defined to be the probability of the data given the parameters $\bm{\theta}$ , so $\mathcal{L}(\bm{\theta}\mid\bm{\mathrm{x}})=P(\bm{\mathrm{x}}\mid\bm{\theta})$ . Then the Fisher matrix $\bm{\mathsf{F}}_{\alpha\beta}$ satisfies

[TABLE]

The expected $1\sigma$ error on each parameter $\theta_{\alpha}$ , marginalized over all other parameters, can be obtained from the diagonal elements of the inverse Fisher matrix using $\sigma^{2}(\theta_{\alpha})=(\bm{\mathsf{F}}^{-1})_{\alpha\alpha}$ . To simplify the computation of $\bm{\mathsf{F}}_{\alpha\beta}$ we make the assumption that the likelihood function is a multivariate Gaussian,

[TABLE]

where $\mathsf{T}$ denotes a matrix transpose and $|\bm{\mathsf{C}}|=\det\bm{\mathsf{C}}$ is the determinant of $\bm{\mathsf{C}}$ . We have written the mean of the data vector as $\bm{\mu}=\langle\bm{\mathrm{x}}\rangle$ , and its covariance matrix is $\bm{\mathsf{C}}_{ij}=\langle x_{i}\,x_{j}\rangle-\mu_{i}\,\mu_{j}$ . With these assumptions it can be shown that (Tegmark et al., 1997),

[TABLE]

The first term measures variation of the covariance matrix with respect to the parameters, which is often a smaller effect than the variation of the means represented by the second term. In the approximation that this first term may be neglected the Fisher matrix can be computed in terms of the inverse covariance matrix for the fiducial model. Our procedure to obtain this matrix from the simulation suite has already been described in Sections 5 and 6.

**Survey configuration.—**The Fisher formalism depends explicitly on details of the survey under discussion, both through the specification of the data vector $\bm{\mathrm{x}}$ —such as how many redshift bins are used and which Fourier configurations are included—and the properties of the covariance matrix $\bm{\mathsf{C}}$ . In the following we adopt the parameters of an idealized survey of large-scale structure consisting of three independent redshift slices at $z=0$ , $z=0.52$ and $z=1$ . Each slice has volume $V=3.375\,h^{-3}\,\text{Gpc}^{3}$ and a mode cutoff at $k_{\text{max}}=0.3\,h\,\text{Mpc}^{-1}$ . The total Fisher matrix can be written as a sum of the Fisher matrix in each slice,

[TABLE]

We assume that, in each redshift bin, the number density of galaxies is sufficiently high that the effect of shot noise is small. We do not include redshift-space distortions or the effect of complex survey geometry. In general, all of these effects will be significant for a realistic survey and cannot be neglected. However, in this paper our intention is to address the question of whether the proxies described in Section 2 can be competitive with measurements of the Fourier bispectrum in principle. Survey-specific effects will generally reduce the number of configurations that can be measured, or increase the noise on those for which measurements are possible. This will typically weaken the performance of the proxies, meaning that their neglect gives us an estimate of the best-case scenario. While we do not anticipate that astrophysical or observational systematics will affect any one proxy more than the others, this is an interesting question to explore in future.

Each of the constraints we present includes a prior from the cosmic microwave background power spectrum. We implement this prior by adding a fourth Fisher matrix,

[TABLE]

Details of the computation of $\bm{\mathsf{F}}^{\text{CMB}}$ for our choice of fiducial parameters were given by Smith et al. (2014).

7.2 Constraining power of the bispectrum and its proxies

In this section we present our forecasts. To minimize modelling errors we construct the Fisher matrix for each proxy using quantities measured from simulation, except for derivatives with respect to the bias parameters which cannot be obtained in this way. For the Fourier bispectrum we compute these derivatives analytically by differentiating the one-loop power spectrum (34) and the tree-level bispectrum (35). Once the derivatives have been obtained we replace occurrences of the dark matter power spectrum and bispectrum with their measured values. Our prescription for the proxies is similar, using the one-loop power spectrum to estimate derivatives of $P(k)$ and tree-level formulae together with the formulae of Section 3 to estimate derivatives of the proxy.

We plot the forecast $1\sigma$ confidence contours in Fig. 9. Each panel shows predicted joint constraints for a pair of parameters after marginalizing over all the others. The grey shaded region marks the constraint predicted from measurements of the power spectrum only, except for inclusion of the CMB prior that we apply to all estimates. The solid dark-blue line marks the constraint predicted from $P+ib$ ; the long-dashed red line marks the constraint predicted from $P+\ell$ ; the short-dashed light-blue line marks the constraint predicted from $P+\beta$ ; and the solid black line marks the constraint predicted from $P+B$ . We summarize the marginalized $1\sigma$ error bars in Table 3. The value in parentheses following each uncertainty indicates the percentage improvement compared to use of $P(k)$ alone.

**Improvement from three-point correlation data.—**First consider the joint constraints from $P+B$ (solid black lines in Fig. 9). These demonstrate that substantial improvements can be achieved compared to measurement of the power spectrum only. This is especially evident for $\sigma_{8}$ and the two bias parameters, for which the improvement is roughly $70\%$ – $80\%$ ; compare the second column of Table 3. This is perhaps unsurprising: the bispectrum constrains a different combination of $\sigma_{8}$ and $b_{1}$ than the power spectrum, and therefore assists in breaking their degeneracy (Fry, 1994; Matarrese et al., 1997). Nevertheless, other parameters that do not participate in this degeneracy also experience improvements in the range $13\%$ – $22\%$ , with the exception of $\Omega_{b}$ . This is already very well-measured by the CMB prior, and large-scale structure measurements can add little new information. These conclusions are similar to those reported by Sefusatti et al. (2006), who suggested that inclusion of Fourier bispectrum measurements could reduce uncertainties on $\Omega_{m}$ and $\sigma_{8}$ by a factor in the range $1.5$ to $2$ .

Next, the forecast for the integrated bispectrum (solid dark-blue lines) shows that it offers negligible improvement, of order $\sim 2\%$ , in comparison to $P$ alone. This is consistent with the very small dependence on cosmological parameters discussed in Section 5.2, and the low signal-to-noise obtained in Section 6. On the other hand, the line correlation function offers comparable constraints to the Fourier bispectrum for $\sigma_{8}$ and $b_{1}$ , which receive improvements of $53\%$ and $68\%$ , respectively. Eggemeier & Smith (2017) demonstrated that this occurs because the line correlation function is nearly independent of $b_{1}$ and therefore probes a different direction in parameter space than $P$ or $B$ . Also, inclusion of $\ell$ measurements increases sensitivity to the dark energy parameters $w_{0}$ and $w_{a}$ by $\sim 9\%$ . These improvements are only marginally degraded compared to those from $P+B$ , which are of order $15\%$ .

Finally, Fig. 9 demonstrates that the modal bispectrum with $n_{\text{max}}=50$ (short-dashed light-blue lines) is predicted to yield error bars nearly equivalent to the Fourier bispectrum with $95$ triangles. Note especially that there is no sign of the significant difference in constraining power between $B$ and $B_{\epsilon}$ —which is the quantity implicitly measured by $\beta$ with our choice of basis—that was suggested by our analysis of signal-to-noise in Section 6. We return to this apparent discrepancy in Section 7.5 below. Just as important, the differences between the cases $n_{\text{max}}=10$ and $n_{\text{max}}=50$ are mostly negligible. Therefore, even with as few as $n_{\text{max}}=10$ modes, the modal decomposition retains nearly the full constraining power of the bispectrum. However, it should be remembered that Fig. 5 suggests the Fourier bispectrum reconstructed with so few modes will introduce more significant scatter. In a realistic analysis, these reconstruction errors could manifest themselves as a bias on the best-fit cosmological parameters. Unfortunately we cannot account for this bias in our Fisher analysis, but it deserves further investigation.

**Combination with other observables.—**The strong degeneracy between $\sigma_{8}$ and $b_{1}$ can be broken by other means. For example, it is possible to use weak lensing measurements that probe the matter power spectrum directly. Given that inclusion of 3-point correlation data yields the largest improvements for $\sigma_{8}$ and the bias, it is worthwhile considering what improvements should be expected were the bias to be fixed by other cosmological observations.

In a scenario of this kind the power spectrum constraints would not be weakened by marginalization over the bias parameters, and therefore inclusion of 3-point correlation data would no longer yield such a dramatic improvement for $\sigma_{8}$ . However, we still find encouraging improvements for many parameters. For example, inclusion of either Fourier or modal bispectrum measurements would decrease uncertainty on $\sigma_{8}$ by $\sim 25\%$ and all other parameters except $\Omega_{b}$ by $10\%$ – $15\%$ . Inclusion of $\ell$ measurements would decrease uncertainty on $\sigma_{8}$ by $20\%$ , on the dark energy parameters by $\sim 10\%$ , and for all other parameters by $\lesssim 5\%$ . We conclude that, even in the extreme case that $b_{1}$ and $b_{2}$ can somehow be determined exactly, inclusion of 3-point correlation data still provides valuable additional information.

These Fisher forecasts should be interpreted with some care. As explained above, we do not include a number of astrophysical and observational effects that complicate the analysis of realistic galaxy survey data. These include redshift uncertainties, redshift-space distortions, irregular survey geometries and shot noise. In particular, for the forecasts presented here the effective shot noise is set by the number density $\bar{n}=0.125\,h^{3}\,\text{Mpc}^{-3}$ of particles in our simulation suite. This is substantially larger than the galaxy number densities that will be achieved by upcoming surveys. We return to this issue in Section 8.2, where we discuss how our predictions would be modified by a more realistic number density.

7.3 Effect of non-Gaussian covariance and cross-covariance

The non-Gaussian covariance measured in simulations differs from the Gaussian approximation in two ways: (1) it includes additional contributions to the variance of each bin from higher-order correlations, and (2) it adds or enhances coupling between different bins of a single proxy, and between bins of different proxies. These non-Gaussian corrections generally lead to weaker parameter constraints when compared to forecasts constructed using the Gaussian approximation, because this assumes that every bin contributes independent information. In this section we compare the relative impact of non-Gaussian covariance for the different proxies by contrasting Fisher forecasts made with and without its inclusion. We give results for the combinations $P+ib$ , $P+\ell$ , $P+\beta$ and $P+B$ and each choice of theoretical model—tree-level SPT, 1-loop SPT, or the halo model.

**Increase in uncertainty from non-Gaussian contributions.—**Fig. 10 shows the relative increase $\sigma_{NG}/\sigma_{G}-1$ in predicted uncertainty for each parameter when non-Gaussian contributions are included. To estimate $\sigma_{G}$ we use the expressions for Gaussian covariance given in Section 3 with each quantity replaced by its value measured from our simulations. For example, to construct the Gaussian covariance for $ib$ we use equation (14) with $\sigma_{L}^{2}$ replaced by its measured value. We could equally well have constructed similar estimates using one of the theoretical models to calculate such values, but the result is not very different. The discussion in this section would continue to apply if we were to reproduce Fig. 10 using estimates generated by any of these prescriptions.

The increase in uncertainty induced by inclusion of non-Gaussian effects depends on the measure of 3-point correlations used to generate constraints, the method used to estimate the Gaussian covariance matrix, and the parameter in question. In general we find that the Gaussian approximation underpredicts the uncertainty for the Fourier bispectrum more strongly than for its proxies. Note also that—although $P+\beta$ and $P+B$ yield nearly identical constraints when the non-Gaussian covariance is used, as described in Section 7.2—the importance of the non-Gaussian covariance for these combinations is not the same. Since the quantity $B_{\epsilon}$ measured by $\beta$ is not the same as $B$ , neglecting cross-covariance with $P$ (as the Gaussian covariance does) will leave out different information for $P+\beta$ compared to $P+B$ .

Inclusion of non-Gaussian covariance impacts uncertainties for $w_{0}$ , $w_{a}$ and $\sigma_{8}$ more significantly than the other parameters. This non-uniformity means that it is not obvious how inclusion of non-Gaussian covariance might impact constraints from 3-point correlations on further parameters not considered here. For instance, a number of authors have used Gaussian covariances to forecast future constraints on a primordial bispectrum generated by inflation; see Scoccimarro et al. (2004), Sefusatti & Komatsu (2007), Sefusatti et al. (2012), Baldauf et al. (2016), Welling et al. (2016) and Tellarini et al. (2016). It is not yet clear how these forecasts will change when more realistic non-Gaussian covariances are used.

**Inclusion of cross-covariance.—**In Fig. 11 we summarize the influence of cross-covariance between $P$ and the 3-point measures by comparing constraints using the full non-Gaussian covariance to constraints where the cross-covariance has been set to zero. We find that inclusion of cross-covariances reduces the predicted uncertainties for nearly all parameters and choices of combination $P+X$ , whether or not we marginalize over galaxy bias. In the few cases where inclusion of cross-covariance did not reduce the uncertainties (e.g. constraints on $\Omega_{m}$ from $P+B$ and $P+\beta$ ), the predicted error bar is weakened by less than 12% of the error bar without cross-covariance. Overall, we find that ignoring cross-covariances can overestimate uncertainties by up to $\sim 40\%$ when we do not marginalize over the bias, and by $40-70\%$ for the special case of bispectrum constraints on the bias parameters themselves.

This reduction of uncertainties due to inclusion of cross-covariances may be surprising. While we have not explicitly identified the source of the improved constraining power, this is not a new feature of Fisher forecasts using non-Gaussian covariances. For example, a number of authors using cross-correlations between cluster counts, weak lensing power spectra and the weak lensing bispectrum have found that parameter constraints can improve when cross-covariances between strongly-coupled measurements are included (Takada & Bridle, 2007; Sato & Nishimichi, 2013; Kayo et al., 2013). But it is also possible that our improvements are partly due to the galaxy biasing model we have chosen. A simulation of halos, rather than dark matter alone, could be used to verify the effect when simultaneously constraining both cosmological parameters and galaxy bias.

The conclusion of this discussion is that an accurate estimate for the covariance matrix, including non-Gaussian contributions and off-diagonal terms, is important if we wish to obtain reliable constraints. Unfortunately, this is especially true for the Fourier bispectrum for which the Gaussian approximation most significantly underestimates the true parameter uncertainties. This implies that surveys aiming to generate constraints from inclusion of $B$ measurements cannot evade the computational difficulties associated with estimating their covariance matrix.

To mitigate these difficulties we could consider use of $P+\beta$ rather than $P+B$ . As we have seen in Section 7.2, these combinations yield nearly equivalent constraints using $95$ Fourier configurations and $50$ modal coefficients respectively, and therefore the modal decomposition makes the information content of the bispectrum more accessible by reducing the size of the covariance matrix needed to obtain it. We consider the efficiency with which each proxy can compress the information carried by $B$ in Section 8.1.

7.4 Theory-dependence of the forecasts

In Section 7.2 we have presented our Fisher forecasts based on simulated data, and in Section 7.3 we have discussed the influence of non-Gaussian covariance and cross-covariances. These results enable us to assess the information content carried by the Fourier bispectrum and its proxies, but the question of how easily these statistics can be deployed remains open. In particular, we would like to know whether the use of simulated data is essential, or whether any of the models described in Section 3 are sufficient. In this section we study the dependence of our forecasts on the choice of theoretical model used to estimate the derivatives $\partial\bm{\mu}/\partial\theta_{\alpha}$ in equation (59).

**Match to forecast from simulations.—**First, we consider whether there is a model that provides a clear best-match to the forecast using simulated data. Fig. 12 compares the forecasts for each parameter using different prescriptions for the covariance matrix and for different choices of theoretical model, with marginalization over the bias included or excluded. The bar heights represent the reduction in the predicted uncertainty provided by a given combination, relative to the base model of power spectrum data only combined with a CMB prior. The results of Section 7.2 are labelled ‘sim’. Unfortunately, for each combination $P+X$ there is no single choice of theoretical model yielding forecasts that provide the best match to the ‘sim’ outcome for all parameters—with or without marginalization over bias.

For example, consider the combination $P+B$ in the first column of Fig. 12. This summarizes forecasts generated by including non-Gaussian covariance and marginalization over the bias. For $\sigma_{8}$ it is 1-loop SPT that gives the best match to the ‘sim’ result, but for the linear bias parameter $b_{1}$ the best match comes from tree-level SPT.

Alternatively, one could ask whether any one model provides uniformly conservative or uniformly optimistic forecasts. If so, that model could be used to estimate upper or lower limits on the uncertainty for any chosen parameter. But Fig. 12 demonstrates that there are no models with such properties. For example, focusing again on the first column, there is no single choice of theoretical model for $P+B$ that forecasts the largest or smallest improvement for all parameters.

**Sensitivity to theory error.—**Next, we study the variation in forecasts for the Fourier bispectrum and its proxies when we change the model used to compute $\partial\bm{\mu}/\partial\theta_{\alpha}$ . To understand the sophistication required to obtain accurate models we will need to understand which of these statistics (if any) are especially sensitive or immune to theoretical mis-modelling. We measure this dependence by a sensitivity factor, which we define to be the ratio between the largest and smallest forecast uncertainties taken over the models of Section 3. A sensitivity factor close to unity indicates that a forecast uncertainty depends only weakly on the choice of theoretical model, while a large value indicates that the model has a strong influence on the final outcome.

We plot these sensitivity factors in Fig. 13, computed with inclusion of all bias parameters and using non-Gaussian covariances. Therefore the sensitivity factor solely reflects the variation in uncertainty produced by different choices for theoretical model. We conclude that there is no single measure of 3-point correlations that consistently yields the largest or smallest sensitivity to variations in modelling. Therefore, there is apparently no single combination $P+X$ that should be preferred to minimize the effect of theory errors on inferred parameter constraints.

**Ranking by constraining power.—**Neither of these criteria provide a rationale to prefer a choice of theoretical model. Nevertheless, we do find some general trends. Irrespective of theoretical model, we find the largest reductions in parameter uncertainties when the bias is constrained simultaneously with the cosmological parameters. Also, the Fourier bispectrum and modal bispectrum consistently offer the most significant improvements compared to $P$ -only measurements, with very similar predicted uncertainties. The line correlation function achieves moderate improvement compared to $P$ -only, while the integrated bispectrum has very weak constraining power—at least for the parameter set we consider. We conclude that $P+B$ or $P+\beta$ should be preferred for constraints on $\Lambda$ CDM parameters, with $P+\beta$ offering similar information at reduced computation cost as discussed at the end of Section 7.3.

**Relative importance of modelling and non-Gaussian covariance.—**Finally, we consider the relative importance of non-Gaussian covariance and theoretical modelling for obtaining quantitatively accurate forecasts. In Fig. 14 we show the fractional difference in Fisher forecasts induced by variation of theoretical model (orange bars) and use of the Gaussian approximation (blue bars). To quantify the significance of theoretical modelling we plot $\max(|\sigma_{NG,i}/\sigma_{NG}(\text{sim})-1|)$ , where $i\in\{\text{tree},\text{1-loop},\text{halo}\}$ . Therefore larger orange bars reflect more significant deviation from the simulated forecast due to theoretical uncertainty. Meanwhile we quantify the role of the covariance matrix by plotting $|\sigma_{G}(\text{sim})/\sigma_{NG}(\text{sim})-1|$ , so increasing blue bars show that the Gaussian approximation generates more significant errors in the forecast.

Fig. 14 shows that the impact of theoretical uncertainty for $P+\beta$ and $P+B$ is generally less significant than neglect of non-Gaussian covariance, whether or not we marginalize over the bias. In contrast, for $P+\ell$ the effect of modelling nearly always dominates because of the difficulties with the halo model discussed in Section 5.2. For $P+ib$ the non-Gaussian covariance plays an important role if the bias parameters are not included, but theoretical modelling dominates when they are.

On balance, these results indicate that our forecasts are slightly less sensitive to theory error than to the approximation of Gaussian covariance. This could be because the inverse covariance weighting suppresses contributions from the non-linear regime where the theoretical predictions are most discrepant. But the difference is not large: the average variation in our predicted uncertainties from $P+B$ and $P+\beta$ due to theory modelling is $36\%$ , whereas the variation due to Gaussian covariances is $49\%$ . Therefore, we conclude that both issues must be addressed in order to obtain quantitatively accurate results.

7.5 Signal-to-noise as a proxy for the information content

It is now necessary to address the question of why the large discrepancy in uplift between the signal-to-noise of $B$ and $B_{\epsilon}$ (equivalently $\beta$ ) observed in Section 6 did not translate into significant differences in the forecast for parameter uncertainties in Section 7.2.

Consider a vector of values $S$ combining measures $P$ and $X$ of the 2- and 3-point correlation data, respectively, as defined below equation (52). For a given parameter $\theta$ the reduction in uncertainty compared to measurements from $P$ alone can be estimated in the Fisher framework by

[TABLE]

To avoid ambiguity we use the notation $\bm{\mathsf{C}}^{\text{P}}$ to denote the covariance matrix of the power spectrum only. Meanwhile, the increase in signal-to-noise in the same scenario is given by

[TABLE]

The uplift in signal-to-noise is often taken as an approximation to the reduction in parameter uncertainty, which avoids the need to compute $\partial S_{i}/\partial\theta$ . As we have seen in Section 5.2, these derivatives can be rather fragile and are susceptible to significant errors caused by theory mis-modelling. Unfortunately, when applied to $S=P+B$ and $S=P+\beta$ our analysis demonstrates that the ratios $\bm{\mathsf{F}}_{\theta}(P+B)/\bm{\mathsf{F}}_{\theta}(P)$ and $\bm{\mathsf{F}}_{\theta}(P+B_{\epsilon})/\bm{\mathsf{F}}_{\theta}(P)$ are nearly equal, whereas the same ratios constructed using $\mathcal{S}/\mathcal{N}$ are very discrepant. Therefore we must conclude that improvements in signal-to-noise cannot always be interpreted as a predictor of the improvement in Fisher information.

**Invariance of the Fisher matrix.—**First consider the Fisher matrix. Suppose we perform a redefinition so that $S_{i}\rightarrow S^{\prime}_{i}=S^{\prime}_{i}(S_{j})$ , where $S^{\prime}_{i}$ may be an arbitrary nonlinear function of the original measurements. For example, the transformation from $B$ to $B_{\epsilon}$ is of this type. The derivative $\partial S_{i}/\partial\theta_{\alpha}$ transforms ‘contravariantly’ on its index $i$ , in the sense $\partial S^{\prime}_{i}/\partial\theta_{\alpha}=\sum_{m}(\partial S^{\prime}_{i}/\partial S_{m})(\partial S_{m}/\partial\theta_{\alpha})$ . Meanwhile, the covariance matrix becomes

[TABLE]

where ‘ $\cdots$ ’ denotes terms involving higher order correlations that we have not written explicitly. Provided these are small compared to the $\bm{\mathsf{C}}^{S}_{mn}$ term, equation (64) shows that the covariance matrix also transforms ‘contravariantly’, and therefore that its inverse transforms ‘covariantly’. Subject to these approximations we conclude that the Fisher matrix should be roughly invariant. This agrees with our observation that $\bm{\mathsf{F}}_{\theta}(P+B)$ and $\bm{\mathsf{F}}_{\theta}(P+B_{\epsilon})$ are nearly equal, demonstrated numerically in Table 3.

Now consider the signal-to-noise. Since $S_{i}$ has neither a co- or contravariant transformation law, the combination $\sum_{i,j}S_{i}\bm{\mathsf{C}}^{-1}_{ij}S_{j}$ appearing in the signal-to-noise will typically not be invariant. Therefore different choices $S_{i}$ and $S^{\prime}_{i}$ may yield inequivalent results for $\mathcal{S}/\mathcal{N}$ . For example, we have verified that using $P+\ln B$ predicts a significant increase in the signal-to-noise compared to $P+B$ , whereas their Fisher matrices continue to agree. In Table 4 we summarize the improvement in unmarginalized constraints from the addition of $B$ or $B_{\epsilon}$ . This demonstrates that empirically the increase in signal-to-noise from $B_{\epsilon}$ provides a more accurate estimate of the Fisher information than $B$ . This property holds for both proxies of $B_{\epsilon}$ , namely the modal bispectrum, and the line correlation function.

**Gaussian limit.—**This outcome is not inconsistent with the result that $B$ and $B_{\epsilon}$ show an equivalent uplift in signal-to-noise in the Gaussian approximation. In this case the covariance matrix for $B_{\epsilon}$ is $\bm{\mathsf{C}}_{ij}^{B_{\epsilon}}=\mathsf{N}\bm{1}_{ij}$ , where the constant $\mathsf{N}$ takes the values $1$ , $2$ or $6$ for scalene, isosceles and equilateral configurations, respectively, as described in Section 3. In the same approximation the covariance matrix for the Fourier bispectrum is $\bm{\mathsf{C}}_{ij}^{B}=\mathsf{N}P(k_{i_{1}})P(k_{i_{2}})P(k_{i_{3}})\bm{1}_{ij}$ . Therefore we conclude that the signal-to-noise for $B$ and $B_{\epsilon}$ is identically equal as

[TABLE]

In the Gaussian approximation the power spectrum is an independent source of information, which explains the agreement. However, once off-diagonal contributions in the covariance matrix are included, $B$ and $P$ are no longer independent and non-linear combinations may give very different results for the signal-to-noise.

**Comparison with Chan & Blot.—**Our signal-to-noise for $P+B$ differs from that reported by Chan & Blot (2016) because we include cross-covariance (Section 6). Since empirically the signal-to-noise of $P+B_{\epsilon}$ gives a more accurate estimate of the information gain from 3-point correlation data, the $\sim 26\%$ expected improvement from the 3-point information in $B_{\epsilon}$ is in good agreement with the $\sim 30\%$ improvement suggested by Chan & Blot (2016). However, the details of these calculations are rather different. The unmarginalized constraints in Table 4 and most of the marginalized constraints in Table 3 support this conclusion. For $\sigma_{8}$ , $b_{1}$ and $b_{2}$ , for which the effect in Table 3 is substantially larger than $\sim 30\%$ , we ascribe the improvement to degeneracies of $P$ that are broken by 3-point correlation data.

8 Discussion

8.1 Compression and efficiency of the Fourier bispectrum proxies

In an ideal survey aiming to measure the Fourier bispectrum we should clearly choose a bin width $\Delta k$ that is sufficiently small to reproduce all small-scale features of interest. However, because the number of Fourier configurations in a volume with mode cut-off $k_{\text{max}}$ scales as $\sim(k_{\text{max}}/\Delta k)^{3}$ this task will quickly become computationally expensive. And, as we have emphasized several times, a more serious problem is that we must estimate and invert the covariance matrix for all these measurements. This requires us to perform at least as many N-body simulations as the number of configurations that we retain.

In this section we consider how well this large number of Fourier configurations can be compressed by the proxies described in Section 2. Suppose that available resources limit the number of simulations that can be performed in such a way that we can estimate an accurate covariance matrix for $\sim 30$ bins of the Fourier bispectrum or one of its proxies, in combination with another $30$ measurements of the power spectrum $P(k)$ . Among the measures of 3-point correlations that we consider, is there a preferred choice that provides optimal constraints on our set of cosmological parameters? If so, this measure would provide the most successful compression of the full Fourier bispectrum into a manageable number of measurements.

**Compression by reduction to $\leqslant 30$ bins.—**To this end we combine the power spectrum bins with a single additional configuration from the Fourier bispectrum or one of its proxies, and compute the corresponding Fisher matrix (as in Section 7.1) using values for $\partial\bm{\mu}/\partial\theta_{\alpha}$ estimated from our simulation suite. The four left panels of Fig. 15 show the reduction in predicted uncertainty—defined as the shrinkage of the error bar, $1-\sigma_{P+X}/\sigma_{P}$ —for the representative parameters $\sigma_{8}$ (solid lines) and $w_{0}$ (dotted lines) for each of the possible bins. Using these reductions as a measure of the information stored in each bin we conclude that most of the information carried by the Fourier bispectrum $B$ is contained in small-scale triangles (towards larger triangle index). A similar conclusion applies for the line correlation function, for which significant reductions occur only for the first $\sim 12$ bins, corresponding to the range of scales $10\,h^{-1}\,\text{Mpc}$ – $50\,h^{-1}\,\text{Mpc}$ . This is reasonable, because the line correlation is constructed to give a negligible signal on large scales. Finally, while the modal decomposition exhibits some variability, smaller mode numbers typically provide larger gains. The integrated bispectrum shows consistently weak improvements over all bins.

Second, for each combination $P+X$ we identify a set of $30$ bins for $X$ that provide the largest improvements. Adding them cumulatively to the power spectrum, starting from the bin carrying most information, we obtain the plot on the right-hand side of Fig. 15. Both the line correlation function and the modal bispectrum converge rapidly to the maximal improvement available from the entire set of bins that we measure (this is $30$ bins for $\ell$ and $50$ modes for $\beta$ —see Table 2). For example, the line correlation is already within $2\%$ of the maximum after we have added $\sim 2$ bins, while only $\sim 5$ modes of $\beta$ are required to arrive at a similar value for the modal bispectrum. In comparison the Fourier bispectrum converges much more slowly to the maximum provided by the $95$ bins that we measure. This is especially evident for $\sigma_{8}$ , for which the improvement from the Fourier bispectrum has not yet converged to its maximum value after the $30^{\mathrm{th}}$ bin. (For guidance, we mark this maximum value with black arrows on the plot.) However, it should be noted that our procedure to select the set of $30$ bins is not optimal because it does not account for covariances between them. By analysing random subsets of the $95$ possible bispectrum bins we find that faster convergence is possible, giving up to $\sim 90\%$ of the maximum reduction after $30$ bins.

**Compression by broadening bins.—**Rather than reducing the number of configurations by restriction to a subset, we might alternatively increase the width of each bin. The same volume of data would then be compressed into fewer measurements. To compare the performance of this strategy we repeat the analysis described above for the Fourier bispectrum with a broader bin width $\Delta k=12k_{\mathrm{f}}$ , which gives $34$ rather than $95$ Fourier configurations with $k_{\text{max}}=0.3\,h\,\text{Mpc}^{-1}$ . We plot the corresponding cumulative reduction in uncertainty for $\sigma_{8}$ as star-shaped symbols in the right-hand panel of Fig. 15. After $30$ bins the improvement is similar to that obtained from the modal bispectrum, with the same caution about rate of convergence due to correlation between bins. Therefore—rather surprisingly—in this case we find no clear preference for the bin width $\Delta k=8k_{\mathrm{f}}$ or $\Delta k=12k_{\mathrm{f}}$ , except that $\Delta k=8k_{\mathrm{f}}$ is more computationally expensive, and it is more difficult to find an optimal subset of configurations. However, it is not clear whether this conclusion would survive in a more realistic analysis, where the signal can be noisy and demands finer binning. To explore these issues in detail would require a more comprehensive analysis.

**Results.—**This analysis agrees with the conclusions of Sections 7.3 and 7.4, and supports the modal bispectrum as a good choice of proxy for 3-point correlation data. In addition to the advantages discussed in previous sections, it requires the fewest bins and loses almost no information.

These results could be modified in cases where it is possible to compute a covariance matrix for $\gg 30$ configurations of the Fourier bispectrum, as done (for example) by Gil-Marin et al. (2016). However, the mock catalogues used to produce such covariance matrices are often generated using perturbation theory and therefore are likely to be inaccurate on small scales. We expect that it is a better strategy to use fewer bins and obtain high-quality measurements of the covariance matrix from catalogues generated using full N-body simulations. The significant benefit of the modal decomposition is that it facilitates construction of the smallest set of bins that still carry a majority of the information.

Finally, although the line correlation function provides weaker improvements than either the Fourier bispectrum or modal bispectrum, it has the advantage that it clearly separates the scales carrying useful information from those that do not—all bins with $r\gtrsim 50\,h^{-1}\,\text{Mpc}$ have negligible impact. It is also possible that the performance of the line correlation function could be improved by relaxing the condition of strict collinearity, which would increase the range of Fourier configurations it is able to aggregate.

8.2 Shot Noise

Galaxies are discrete, point-like tracers of the underlying matter fluctuations, and therefore samples of their abundance are affected by shot noise. This noise is expected to impact higher-order statistics more significantly than the power spectrum (Sefusatti & Scoccimarro, 2005; Chan & Blot, 2016). Up to this point our analysis has implicitly used the low effective shot noise provided by our simulations, and therefore there is some concern that our forecasts will degrade with larger, more realistic noise. In this section we perform an approximate analysis of this degradation and quantify its effect on our predicted parameter uncertainties.

Assuming Poisson statistics, we may correct for shot-noise contributions to the observed discrete power spectrum $\hat{P}^{\text{disc}}$ and bispectrum $\hat{B}^{\text{disc}}$ by subtraction (Peebles, 1980; Matarrese et al., 1997),

[TABLE]

Here, $\bar{n}$ is the average number density of the discrete tracers. We use the upper and lower limits $\bar{n}_{1}=10^{-2}\,h^{3}\,\text{Mpc}^{-3}$ and $\bar{n}_{2}=10^{-4}\,h^{3}\,\text{Mpc}^{-3}$ to represent optimistic and pessimistic levels of shot noise for upcoming galaxy surveys. To measure $\hat{P}^{\text{disc}}$ and $\hat{B}^{\text{disc}}$ we downsample the number of particles in our simulation suite by selecting random subsets matching the desired averaged density $\bar{n}$ , and use this to compute corrected estimators $\hat{P}$ and $\hat{B}$ from equations (66a) and (66b). Although this downsampling procedure will not introduce exactly Poisson shot noise, we have checked that it is nearly Poisson by verifying that the corrected quantities agree with measurements made using the full set of particles to within a few percent. Strictly speaking, the covariance matrix of $\hat{P}$ and $\hat{B}$ obtained in this way is the matter covariance with Poisson shot noise, but for our fiducial biasing model we may interpret it as the covariance of the galaxy power spectrum and bispectrum with Poisson shot noise. We use this covariance, leaving the parameter derivatives unchanged from Section 7.2, to compute the Fisher matrices.

We plot forecasts using the fiducial number densities $\bar{n}_{1}$ and $\bar{n}_{2}$ in Fig. 16, with orange ellipses corresponding to the lower noise level (higher number density) and blue ellipses corresponding to the higher noise level (lower number density). The orange ellipses show good agreement with the forecasts for the idealized scenario of Section 7.2, indicating that relatively little degradation occurs. However, it is unlikely that such high number densities will be attained in the near future. By contrast the blue ellipses represent a conservative view of what should be possible.

If shot noise degrades the signal from 3-point correlations more strongly than for 2-point correlations then the fractional improvement from its inclusion should be smaller for low $\bar{n}$ . In terms of Fig. 16 this means that the difference between the light and dark blue ellipses should be smaller than the difference between the light and dark orange ellipses. This effect is visible for some parameters, such as $\sigma_{8}$ . However, in the case of $\Omega_{m}$ , $w_{0}$ and $w_{a}$ the fractional improvement from inclusion of 3-point correlation data is larger at lower $\bar{n}$ . The effect for $w_{0}$ and $w_{a}$ is particularly striking. Using all particles in our simulations, the addition of $B$ data decreased measurement uncertainties by $16\%$ and $15\%$ , respectively (see Table 3). With $\bar{n}=10^{-4}\,h^{3}\,\text{Mpc}^{-3}$ we find improvements of $41\%$ and $36\%$ . We interpret this to mean that recovery of cosmological information in the presence of shot noise depends significantly on cross-covariances between measurements. These cross-covariances themselves depend on the shot noise and can partially subtract its effect.

9 Conclusions

As large scale structure surveys grow in size and sophistication, the rapidly-approaching cosmic variance limit on 2-point statistics encourages us to look to higher-order correlations, such as the 3-point function, as a new source of information. Previously, Sefusatti et al. (2006) suggested that considerable additional constraining power could be achieved by combining the power spectrum and bispectrum. On the other hand, the signal-to-noise analysis given by Chan & Blot (2016) pointed to no more than modest improvements. Our results show that there is a significant benefit from inclusion of three-point correlation data, but its benefits must be balanced against the challenges it brings.

In this paper, we focus on two particular challenges: (1) The number of measurable configurations of the Fourier bispectrum is generally very large unless one coarse-grains the data. We have investigated whether the modal bispectrum, line correlation function and integrated bispectrum can act as ‘proxies’ for the Fourier bispectrum, compressing its information into fewer configurations without unacceptable information loss. (2) Bispectrum observations are difficult to model to the same accuracy as the power spectrum. Errors in clustering predictions from theoretical models, in addition to assumptions about covariances and noise properties, generally propagate into inaccurate error bars or a bias on inferred parameters. We have quantified how our forecasts are influenced by both the assumption of Gaussian covariance and theoretical errors.

To do so we have measured the power spectrum, Fourier bispectrum and each of its proxies from a suite of 200 dark matter N-body simulations at redshifts $z=0$ , $z=0.52$ and $z=1$ to obtain fully non-Gaussian covariances and cross-covariances. We measure the dependence of each measurement on the cosmological parameters $\{\Omega_{m},\Omega_{b},w_{0},w_{a},\sigma_{8},n_{s},h\}$ using additional simulations displaced from our fiducial model. We assume an local Lagrangian biasing scheme that includes two bias parameters, $\{b_{1},b_{2}\}$ . Using all these components, in combination with theoretical predictions for each proxy from tree-level and 1-loop SPT and the halo model, we have conducted a signal-to-noise analysis and implemented the Fisher forecasting method for an idealized survey scenario. Our main results on the constraining power and future viability of each measure of 3-point correlations are:

**Comparison of 3-point correlation measures.—**Section 7.2 presented our main results. Our forecasts show that inclusion of the Fourier bispectrum offers significant improvements over the power spectrum alone, with $\operatorname{\mathcal{O}}(10\%-30\%)$ improvement on cosmological parameter constraints, and up to $\operatorname{\mathcal{O}}(80\%)$ improvement when it is used to break degeneracies with the bias parameters. The modal bispectrum offers an attractive alternative, achieving equivalent constraints with as few as 10 modes. However, up to 50 modes may be necessary to reconstruct the Fourier bispectrum to within $\lesssim 10\%$ accuracy on individual triangle configurations. The line correlation function appears to be slightly less optimal, although a future extension to sample more Fourier configurations by relaxing the requirement of strict collinearity may improve its performance. The integrated bispectrum offers little constraining power for our set of cosmological parameters. It is sensitive to highly squeezed triangles, whereas the gravitational bispectrum peaks on equilateral triangles. This property of $ib$ is a disadvantage for our purposes, but may be an advantage if one is interested in studying squeezed-mode primordial non-Gaussianity with minimal degeneracies.

**Data compression.—**In Section 8.1, we explored how the total constraining power of each measure is distributed over the total number of data bins. While the Fourier bispectrum and modal bispectrum give nearly equivalent parameter constraints when $\sim 30$ bins are used, the modal method converges to its full constraining power with a smaller subset of bins. We conclude that the modal bispectrum provides more efficient access to the information carried by 3-point correlations.

We note that more realistic survey scenarios—for example, accounting for noisy data—may require finer binning. Increasing the binning resolution of the Fourier bispectrum by a factor of $n$ in each $k$ -dimension corresponds to a factor $\operatorname{\mathcal{O}}(n^{3})$ increase in configurations. The number of simulations required to accurately capture their covariance would increase similarly. If the number of modal coefficients required to capture fine features of the bispectrum does not grow so dramatically, it is possible that the modal bispectrum could accumulate an even larger advantage compared to the Fourier bispectrum.

**Signal-to-noise ratio as a measure of information content.—**In Sections 6, 7.2 and 7.5 we argue that use of the signal-to-noise ratio to predict the constraining power of 3-point correlation data can be misleading. We show that the bispectrum and phase bispectrum—which is probed by the modal bispectrum—give significantly different signal-to-noise ratios, but still yield nearly identical forecasts. As we describe in Section 7.5, for the scenarios considered in this paper, the improvement shown by these forecasts is empirically better predicted by the signal-to-noise ratio of the phase bispectrum $B_{\epsilon}$ than the Fourier bispectrum $B$ . The $\sim\operatorname{\mathcal{O}}(30\%)$ uplift in signal-to-noise from the phase bispectrum translates to the same improvement in cosmological parameter constraints, except for those where degeneracies play a significant role. As we explain in Section 7.5, while this improvement is numerically consistent with Chan & Blot (2016), our procedure is rather different. For a general parameter set and a given measure of the 3-point correlations, the signal-to-noise will not typically give an accurate estimate of its constraining power.

**Impact of non-Gaussian covariances.—**Accounting for non-Gaussian covariance is essential for optimally constraining cosmological parameters. In Section 7.3 we showed that the Fourier bispectrum estimator is particularly sensitive to the covariance: our predicted uncertainties may be nearly a factor of 4 too small if the Gaussian approximation is used. At the same time, we find that the non-Gaussian cross-covariance between the power spectrum and the Fourier bispectrum or its proxies generally results in parameter errors that are $\operatorname{\mathcal{O}}(10\%)$ smaller than if cross-covariances are ignored.

**Impact of theoretical modelling uncertainties.—**Our results in Section 7.4 indicate that the impact of theory errors on our predicted uncertainties is smaller than the impact of assuming Gaussian covariance, although both approximations change the forecasts by $\sim 30\%$ to $50\%$ on average. In this paper we measure the effect of theoretical uncertainty by comparing forecasts using SPT and the halo model to forecasts derived purely from N-body measurements. Our approach differs from that of Baldauf et al. (2016) and Welling et al. (2016), who incorporated estimates of the theory error into their Fisher forecasts by taking the error in each data bin to be the sum of statistical and theoretical errors.

**Impact of shot noise.—**To assess the impact of shot noise, in Section 8.2 we down-sample our simulation suite to averaged number densities of $\bar{n}=10^{-2}\,h^{3}\,\text{Mpc}^{-3}$ and $10^{-4}\,h^{3}\,\text{Mpc}^{-3}$ , and compute forecasts using non-Gaussian covariance matrices that include low and high levels of Poisson shot noise. Contrary to naïve expectations, we find that the addition of 3-point correlation information can become more significant at high levels of shot noise owing to the non-trivial dependence of the cross-covariance on $\bar{n}$ . This appears most significant for the dark energy parameters $w_{0}$ and $w_{a}$ , and suggests that 3-point correlation information may be crucial to distinguish between dark energy models. More generally, our result implies that 3-point correlation measurements may yield significant additional constraining power even when shot noise levels are high.

To make robust inferences with 3-point correlation information, future surveys will require refinement of the methods we have considered here. For example, while we have demonstrated that the modal decomposition provides efficient data compression of the matter bispectrum in an idealized survey, it will be important to verify that this remains true when halo distributions, redshift-space distortions and the complex noise properties of realistic surveys are introduced. We have emphasized the importance of including non-Gaussian covariances and theory uncertainties in our forecasts. Realistic analyses will likely require more efficient ways to obtain covariances, and a consistent approach to inclusion of theory errors in software pipelines. Achieving each of these aims will be important milestones ahead of upcoming surveys of large-scale structure.

Acknowledgements

The work reported in this paper has been supported by the European Research Council under the European Union’s Seventh Framework Programme (FP/2007–2013) and ERC Grant Agreement No. 308082 (JB, DR, DS). This work was supported by the Science and Technology Facilities Council [grant numbers ST/L000652/1, ST/P000525/1] (DS, RES). AE acknowledges support from the UK Science and Technology Facilities Council via Research Training Grant [grant number ST/M503836/1], and thanks Roman Scoccimarro and the Physics Department of New York University for hospitality during the final phases of this project. DR acknowledges useful conversations on the normalization of the modal decomposition with Hemant Shukla. JB would like to thank Benjamin Joachimi for useful discussions.

**Data availability statement.—**To assist those wishing to replicate or extend our results, we have made available measurements of the power spectrum, bispectrum, integrated bispectrum, line correlation function, and modal bispectrum coefficients that have been extracted from our N-body simulation suite.

License Creative Commons Attribution 4.0 International

Author $\copyright$ University of Sussex 2017. Contributed by Donough Regan & Alexander Eggemeier

Attribution Please cite zenodo.org DOI and this paper

Download

Appendix A Construction of the modal decomposition

A.1 Construction of the $Q$ -basis

The goal of the modal decomposition is to write the estimated bispectrum in the form

[TABLE]

where $w(k_{1},k_{2},k_{3})$ is the arbitrary weighting function (22), and the $Q_{n}$ represent basis modes with coefficients $\beta_{n}^{Q}$ . The $Q_{n}$ then contain all the information about the bispectrum. They should span the possible functions on wavenumbers $k_{i}$ that satisfy the triangle condition, $\sum_{i}k_{i}\geqslant 2\max\{k_{1},k_{2},k_{3}\}$ (denoted by $\mathcal{V}$ in the main text) but are otherwise arbitrary. For our concrete numerical results we choose a basis built out of one-dimensional polynomials $q_{p}(x)$ which are orthonormal within $\mathcal{V}$ (Fergusson et al., 2010). More precisely, in a unit box, we define the integral $\mathcal{T}[f]=\int_{\mathcal{V}}f(x)\,\text{d}x\,\text{d}y\,\text{d}z$ , where $x,y,z$ satisfy the triangle condition within the box $x,y,z\in[0,1]$ . Evaluating the $y$ and $z$ integrals, one finds that $\mathcal{T}[f]=0.5\int_{0}^{1}f(x)\,x(4-3x)\,\text{d}x$ . This allows one to define an inner product, $\langle f,g\rangle\equiv\mathcal{T}[fg]$ (which is not equal to the inner product (25)) and set up a generating function for the one-dimensional polynomials, $q_{n}$ , using $w_{n}=\mathcal{T}[x^{n}]$ , in the form of a secular determinant

[TABLE]

where $\mathcal{N}$ is chosen such that $\langle q_{n},q_{m}\rangle=\bm{1}_{nm}$ . The basis functions $Q_{n}(x,y,z)$ are defined as symmetric combinations of combinations of these 1-dimensional polynomials, in the form

[TABLE]

with $n$ representing the triple of indices $\{r,s,t\}$ . After choosing an ordering of these triples we can exchange $n$ for a simpler integer label. For a particular realization with wavenumbers in the range $k_{\text{min}}$ and $k_{\text{max}}$ we use the notation $Q_{n}(k_{1},k_{2},k_{3})$ to represent $Q_{n}(x_{1},x_{2},x_{3})$ , where $x_{i}=(k_{i}-k_{\text{min}})/(k_{\text{max}}-k_{\text{min}})\in[0,1]$ .

A.2 Calculation of the modal coefficients using the voxel method

In Section 4.7 we explained how equation (47) reduces estimation of the modal coefficients from simulation or data to a single 3-dimensional integral over a product of three Fourier transforms $\mathcal{M}_{n}(\bm{\mathrm{x}})$ . If the bispectrum is given analytically, however, we may instead use the simpler equation (25) and compute the inner product using a sum of volumes of all ‘voxels’ within a cubic grid with linear spacing along each axis $(k_{1},k_{2},k_{3})$ .

To calculate the volume of each voxel we relabel the coordinates as $(x,y,z)$ , rescaled so that $0\leqslant x,y,z\leqslant 1$ . We associate each of the 8 possible vertices of the voxel with a value $p_{1},\dots,p_{8}$ , given by the product of $Q_{m}$ and $wB$ (or $Q_{m}$ and $Q_{n}$ in the case of $\langle\kern-2.4pt\langle Q_{m}|Q_{n}\rangle\kern-2.4pt\rangle$ ) at that vertex. Finally, we define an interpolation function $f$ by writing

[TABLE]

The coefficients $a_{i}$ may be obtained analytically in terms of the $p_{i}$ . We assign the volume of the voxel to be zero if fewer than four of its vertices satisfy the triangle condition, while if all $8$ vertices satisfying the triangle condition its volume is

[TABLE]

as expected. For intermediate cases we write the volume in the form

[TABLE]

where $\mathcal{C}$ indicates that only those points satisfying the triangle condition and forming a closed volume within the voxel should be included. In the case of $4$ points there are 3 possible volumes given by

[TABLE]

For $5$ points the only possibility is that $x+y+z\geqslant 2\max\{x,y,z\}$ , while for $6$ and $7$ points there are again $3$ possibilities, given respectively by,

[TABLE]

In each case the analytic form of the integral in terms of the vertex values $p_{i}$ can be calculated easily. Computation of each integral using this voxel method is highly accurate and efficient.

Bibliography78

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Ade et al. (2014) Ade P. A. R., et al., 2014, Astron. Astrophys. , 571, A 24 · doi ↗
2Anderson (2003) Anderson T. W., 2003, An Introduction to Multivariate Statistical Analysis. Wiley
3Assassi et al. (2014) Assassi V., Baumann D., Green D., Zaldarriaga M., 2014, JCAP , 1408, 056 · doi ↗
4Baldauf et al. (2012) Baldauf T., Seljak U., Desjacques V., Mc Donald P., 2012, Phys. Rev. , D 86, 083540 · doi ↗
5Baldauf et al. (2016) Baldauf T., Mirbabayi M., Simonovi M., Zaldarriaga M., 2016
6Bennett et al. (2003) Bennett C., et al., 2003, Astrophys. J. Suppl. , 148, 97 · doi ↗
7Bernardeau et al. (2002) Bernardeau F., Colombi S., Gaztanaga E., Scoccimarro R., 2002, Phys. Rept. , 367, 1 · doi ↗
8Blot et al. (2016) Blot L., Corasaniti P. S., Amendola L., Kitching T. D., 2016, Mon. Not. Roy. Astron. Soc. , 458, 4462 · doi ↗

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Towards optimal cosmological parameter recovery

Abstract

keywords:

1 Introduction

2 The Fourier bispectrum and its proxies

2.1 Integrated bispectrum

2.2 Line Correlation Function

2.3 Modal bispectrum

3 Predicting typical values and covariances for the proxies

3.1 Integrated bispectrum

3.2 Line correlation function

3.3 Modal bispectrum

3.4 Galaxy bias

4 Estimating bispectrum proxies from N-BODY simulations

4.1 Simulations

4.2 Density field

4.3 Estimating the power spectrum

4.4 Estimating the bispectrum

4.5 Estimating the integrated bispectrum

4.6 Estimating the line correlation function

4.7 Estimating the modal bispectrum

4.8 Choice of bins

5 Comparison of theoretical predictions and simulations

5.1 Mean values in the fiducial cosmology

5.1.1 Comparison of measurements and theoretical predictions

5.1.2 Accuracy of modal reconstruction

5.2 Derivatives with respect to cosmological parameters

5.3 Non-Gaussian covariance

6 Cumulative signal-to-noise of the bispectrum proxies

7 Parameter uncertainty forecasts

7.1 Forecasting method

7.2 Constraining power of the bispectrum and its proxies

7.3 Effect of non-Gaussian covariance and cross-covariance

7.4 Theory-dependence of the forecasts

7.5 Signal-to-noise as a proxy for the information content

8 Discussion

8.1 Compression and efficiency of the Fourier bispectrum proxies

8.2 Shot Noise

9 Conclusions

Acknowledgements

Appendix A Construction of the modal decomposition

A.1 Construction of the QQQ-basis

A.2 Calculation of the modal coefficients using the voxel method

A.1 Construction of the $Q$ -basis