Sparse Bayesian mass-mapping with uncertainties: local credible   intervals

Matthew A. Price; Xiaohao Cai; Jason D. McEwen; Marcelo Pereyra,; Thomas D. Kitching (for the LSST Dark Energy Science Collaboration)

arXiv:1812.04017·astro-ph.CO·February 8, 2021

Sparse Bayesian mass-mapping with uncertainties: local credible intervals

Matthew A. Price, Xiaohao Cai, Jason D. McEwen, Marcelo Pereyra,, Thomas D. Kitching (for the LSST Dark Energy Science Collaboration)

PDF

TL;DR

This paper introduces a sparse Bayesian method for weak gravitational lensing mass-mapping that efficiently quantifies uncertainties with local credible intervals, outperforming traditional MCMC techniques in speed while maintaining comparable accuracy.

Contribution

The paper presents a novel sparse hierarchical Bayesian framework for mass-mapping that provides efficient uncertainty quantification using local credible intervals, significantly reducing computational costs.

Findings

01

Uncertainties are conservative and highly correlated with MCMC results.

02

The method achieves a computational efficiency increase of about one million times.

03

Applicable to large-scale surveys like LSST and Euclid.

Abstract

Until recently mass-mapping techniques for weak gravitational lensing convergence reconstruction have lacked a principled statistical framework upon which to quantify reconstruction uncertainties, without making strong assumptions of Gaussianity. In previous work we presented a sparse hierarchical Bayesian formalism for convergence reconstruction that addresses this shortcoming. Here, we draw on the concept of local credible intervals (cf. Bayesian error bars) as an extension of the uncertainty quantification techniques previously detailed. These uncertainty quantification techniques are benchmarked against those recovered via Px-MALA - a state of the art proximal Markov Chain Monte Carlo (MCMC) algorithm. We find that typically our recovered uncertainties are everywhere conservative, of similar magnitude and highly correlated (Pearson correlation coefficient $\geq 0.85$ ) with those…

Tables2

Table 1. Table 1: Comparisons between the local credible interval maps recovered via MAP and those recovered via Px-MALA. Note that higher super-pixels corresponds to coarser resolutions whereas smaller super-pixels leads to higher resolution. This is because the super-pixel size is the size of the groups of pixels used to tile the original image – therefore larger tiling components leads to fewer tiles, and therefore lower resolution.

Super	Pearson	SNR	RMSE
Pixel	Correlation	(dB)	Error
Bolshoi-7
4x4	0.463	11.737	25.892 $%$
8x8	0.848	11.994	25.137 $%$
16x16	0.945	12.509	23.690 $%$
Bolshoi-8
4x4	-0.168	11.467	26.710 $%$
8x8	0.929	11.490	26.637 $%$
16x16	0.941	11.350	27.070 $%$
Buzzard-1
4x4	0.164	10.666	29.289 $%$
8x8	0.916	10.473	29.948 $%$
16x16	0.984	9.262	34.427 $%$
Buzzard-2
4x4	0.140	10.653	29.333 $%$
8x8	0.904	10.465	29.973 $%$
16x16	0.926	9.217	34.605 $%$

Table 2. Table 2: Numerical comparison of computational time of Px-MALA and MAP. The MAP approach typically takes 𝒪 ( 10 − 1 ) 𝒪 superscript 10 1 \mathcal{O}(10^{-1}) seconds, compared to Px-MALA’s 𝒪 ( 10 5 ) 𝒪 superscript 10 5 \mathcal{O}(10^{5}) seconds. Therefore for linear reconstructions MAP is close to 𝒪 ( 10 6 ) 𝒪 superscript 10 6 \mathcal{O}(10^{6}) times faster.

Px-MALA	MAP	Ratio
Time (s)	Time (s)	Ratio
	Buzzard-1
133761	0.182	0.734 $\times 10^{6}$
	Buzzard-2
141857	0.175	0.811 $\times 10^{6}$
	Bolshoi-7
95339	0.153	0.623 $\times 10^{6}$
	Bolshoi-8
92929	0.143	0.650 $\times 10^{6}$

Equations78

p (κ ∣ γ) = \frac{p ( γ ∣ κ ) p ( κ )}{\int _{C^{N}} p ( γ ∣ κ ) p ( κ ) d κ} .

p (κ ∣ γ) = \frac{p ( γ ∣ κ ) p ( κ )}{\int _{C^{N}} p ( γ ∣ κ ) p ( κ ) d κ} .

Φ \in C^{M \times N} : κ \in C^{N} \mapsto γ \in C^{M} .

Φ \in C^{M \times N} : κ \in C^{N} \mapsto γ \in C^{M} .

γ = Φ κ + n .

γ = Φ κ + n .

Φ = F^{- 1} D F .

Φ = F^{- 1} D F .

D_{k_{x}, k_{y}} = \frac{k _{x}^{2} - k _{y}^{2} + 2 i k _{x} k _{y}}{k _{x}^{2} + k _{y}^{2}} .

D_{k_{x}, k_{y}} = \frac{k _{x}^{2} - k _{y}^{2} + 2 i k _{x} k _{y}}{k _{x}^{2} + k _{y}^{2}} .

p(\gamma|\kappa)\propto\exp\Bigg{(}\frac{-\lVert\bm{\Phi}\kappa-\gamma\rVert_{2}^{2}}{2\sigma_{n}^{2}}\Bigg{)},

p(\gamma|\kappa)\propto\exp\Bigg{(}\frac{-\lVert\bm{\Phi}\kappa-\gamma\rVert_{2}^{2}}{2\sigma_{n}^{2}}\Bigg{)},

p(\kappa)\propto\exp\Big{(}-\mu\lVert\bm{\Psi}^{{\dagger}}\kappa\rVert_{1}\Big{)},

p(\kappa)\propto\exp\Big{(}-\mu\lVert\bm{\Psi}^{{\dagger}}\kappa\rVert_{1}\Big{)},

p(\kappa|\gamma)\propto\exp\Bigg{\{}-\Bigg{(}\mu\lVert\bm{\Psi}^{{\dagger}}\kappa\rVert_{1}+\frac{\lVert\bm{\Phi}\kappa-\gamma\rVert_{2}^{2}}{2\sigma_{n}^{2}}\Bigg{)}\Bigg{\}}.

p(\kappa|\gamma)\propto\exp\Bigg{\{}-\Bigg{(}\mu\lVert\bm{\Psi}^{{\dagger}}\kappa\rVert_{1}+\frac{\lVert\bm{\Phi}\kappa-\gamma\rVert_{2}^{2}}{2\sigma_{n}^{2}}\Bigg{)}\Bigg{\}}.

\displaystyle\begin{split}\kappa^{\text{map}}&=\operatorname*{argmax}_{\kappa}\big{\{}p(\kappa|\gamma)\big{\}}\\ &=\operatorname*{argmin}_{\kappa}\big{\{}-\log(\;p(\kappa|\gamma)\;)\big{\}}\\ &=\operatorname*{argmin}_{\kappa}\Bigg{\{}\underbrace{\mu\lVert\bm{\Psi}^{{\dagger}}\kappa\rVert_{1}}_{f(\kappa)}+\underbrace{{\lVert\bm{\Phi}\kappa-\gamma\rVert_{2}^{2}}/{2\sigma_{n}^{2}}}_{g(\kappa)}\Bigg{\}},\end{split}

\displaystyle\begin{split}\kappa^{\text{map}}&=\operatorname*{argmax}_{\kappa}\big{\{}p(\kappa|\gamma)\big{\}}\\ &=\operatorname*{argmin}_{\kappa}\big{\{}-\log(\;p(\kappa|\gamma)\;)\big{\}}\\ &=\operatorname*{argmin}_{\kappa}\Bigg{\{}\underbrace{\mu\lVert\bm{\Psi}^{{\dagger}}\kappa\rVert_{1}}_{f(\kappa)}+\underbrace{{\lVert\bm{\Phi}\kappa-\gamma\rVert_{2}^{2}}/{2\sigma_{n}^{2}}}_{g(\kappa)}\Bigg{\}},\end{split}

l^{(m + 1)} = l^{(m)} + \frac{δ}{2} \nabla lo g p [l^{(m)}] + δ w^{(m + 1)},

l^{(m + 1)} = l^{(m)} + \frac{δ}{2} \nabla lo g p [l^{(m)}] + δ w^{(m + 1)},

p (κ ∣ γ) \propto exp {- f (κ) - g (κ)},

p (κ ∣ γ) \propto exp {- f (κ) - g (κ)},

f (κ) = μ ∥ Ψ^{†} κ ∥_{1}, and g (κ) = ∥ Φ κ - γ ∥_{2}^{2} / 2 σ_{n}^{2} .

f (κ) = μ ∥ Ψ^{†} κ ∥_{1}, and g (κ) = ∥ Φ κ - γ ∥_{2}^{2} / 2 σ_{n}^{2} .

f^{λ} (κ) \equiv \overset{κ}{^} \in C^{N} min {f (\overset{κ}{^}) + ∥ \overset{κ}{^} - κ ∥^{2} /2 λ},

f^{λ} (κ) \equiv \overset{κ}{^} \in C^{N} min {f (\overset{κ}{^}) + ∥ \overset{κ}{^} - κ ∥^{2} /2 λ},

prox_{f}^{λ} (κ) \equiv \overset{κ}{^} \in C^{N} argmin {f (\overset{κ}{^}) + ∥ \overset{κ}{^} - κ ∥^{2} /2 λ} .

prox_{f}^{λ} (κ) \equiv \overset{κ}{^} \in C^{N} argmin {f (\overset{κ}{^}) + ∥ \overset{κ}{^} - κ ∥^{2} /2 λ} .

l^{(m + 1)} = (1 - \frac{δ}{λ}) l^{(m)} + \frac{δ}{λ} prox_{f}^{λ} (l^{(m)}) - δ \nabla g (l^{(m)}) + 2 δ w^{(m)},

l^{(m + 1)} = (1 - \frac{δ}{λ}) l^{(m)} + \frac{δ}{λ} prox_{f}^{λ} (l^{(m)}) - δ \nabla g (l^{(m)}) + 2 δ w^{(m)},

p (κ \in C_{α} ∣ γ) = \int_{κ \in C^{N}} p (κ ∣ γ) I_{C_{α}} d κ = 1 - α .

p (κ \in C_{α} ∣ γ) = \int_{κ \in C^{N}} p (κ ∣ γ) I_{C_{α}} d κ = 1 - α .

C_{α} := {κ : f (κ) + g (κ) \leq ϵ_{α}},

C_{α} := {κ : f (κ) + g (κ) \leq ϵ_{α}},

C_{α}^{'} := {κ : f (κ) + g (κ) \leq ϵ_{α}^{'}} .

C_{α}^{'} := {κ : f (κ) + g (κ) \leq ϵ_{α}^{'}} .

ϵ_{α}^{'} = f (κ^{map}) + g (κ^{map}) + τ_{α} N + N,

ϵ_{α}^{'} = f (κ^{map}) + g (κ^{map}) + τ_{α} N + N,

0 \leq ϵ_{α}^{'} - ϵ_{α} \leq η_{α} N + N,

0 \leq ϵ_{α}^{'} - ϵ_{α} \leq η_{α} N + N,

\xi_{+,\Omega_{i}}=\operatorname*{max}_{\xi}\big{\{}\xi|f(\mathbf{\kappa}_{i,\xi})+g(\mathbf{\kappa}_{i,\xi})\leq\epsilon^{\prime}_{\alpha},\>\forall\xi\in\mathbb{R}\big{\}}

\xi_{+,\Omega_{i}}=\operatorname*{max}_{\xi}\big{\{}\xi|f(\mathbf{\kappa}_{i,\xi})+g(\mathbf{\kappa}_{i,\xi})\leq\epsilon^{\prime}_{\alpha},\>\forall\xi\in\mathbb{R}\big{\}}

\xi_{-,\Omega_{i}}=\operatorname*{min}_{\xi}\big{\{}\xi|f(\mathbf{\kappa}_{i,\xi})+g(\mathbf{\kappa}_{i,\xi})\leq\epsilon^{\prime}_{\alpha},\>\forall\xi\in\mathbb{R}\big{\}},

\xi_{-,\Omega_{i}}=\operatorname*{min}_{\xi}\big{\{}\xi|f(\mathbf{\kappa}_{i,\xi})+g(\mathbf{\kappa}_{i,\xi})\leq\epsilon^{\prime}_{\alpha},\>\forall\xi\in\mathbb{R}\big{\}},

γ^{in} = Φ κ^{in} .

γ^{in} = Φ κ^{in} .

γ = γ^{in} + n .

γ = γ^{in} + n .

σ_{n} = \frac{∥ Φ κ ∥ _{2}^{2}}{N} \times 1 0^{- \frac{SNR}{20}} .

σ_{n} = \frac{∥ Φ κ ∥ _{2}^{2}}{N} \times 1 0^{- \frac{SNR}{20}} .

r = \frac{\sum _{i = 1}^{N^{'}} ( ξ ^{map} ( i ) - ξ ˉ ^{map} ) ( ξ ^{px} ( i ) - ξ ˉ ^{px} )}{\sum _{i = 1}^{N^{'}} ( ξ ^{map} ( i ) - ξ ˉ ^{map} ) ^{2} \sum _{i = 1}^{N^{'}} ( ξ ^{px} ( i ) - ξ ˉ ^{px} ) ^{2}},

r = \frac{\sum _{i = 1}^{N^{'}} ( ξ ^{map} ( i ) - ξ ˉ ^{map} ) ( ξ ^{px} ( i ) - ξ ˉ ^{px} )}{\sum _{i = 1}^{N^{'}} ( ξ ^{map} ( i ) - ξ ˉ ^{map} ) ^{2} \sum _{i = 1}^{N^{'}} ( ξ ^{px} ( i ) - ξ ˉ ^{px} ) ^{2}},

\text{SNR}=20\times\log_{10}\Bigg{(}\frac{\lVert\xi^{\text{px}}\rVert_{2}}{\lVert\xi^{\text{px}}-\xi^{\text{map}}\rVert_{2}}\Bigg{)},

\text{SNR}=20\times\log_{10}\Bigg{(}\frac{\lVert\xi^{\text{px}}\rVert_{2}}{\lVert\xi^{\text{px}}-\xi^{\text{map}}\rVert_{2}}\Bigg{)},

\text{RMSE}=100\times\Bigg{(}\frac{\lVert\xi^{\text{px}}-\xi^{\text{map}}\rVert_{2}}{\lVert\xi^{\text{px}}\rVert_{2}}\Bigg{)}\%.

\text{RMSE}=100\times\Bigg{(}\frac{\lVert\xi^{\text{px}}-\xi^{\text{map}}\rVert_{2}}{\lVert\xi^{\text{px}}\rVert_{2}}\Bigg{)}\%.

f (η κ) = η^{k} f (κ), \forall κ \in R^{n}, \forall η > 0.

f (η κ) = η^{k} f (κ), \forall κ \in R^{n}, \forall η > 0.

C (μ) = \int_{C^{N}} exp {- μ f (κ)} d κ .

C (μ) = \int_{C^{N}} exp {- μ f (κ)} d κ .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Sparse Bayesian mass-mapping with uncertainties: local credible intervals

M. A.Price1, X. Cai1, J. D. McEwen1, M. Pereyra2, T. D. Kitching1 (for the LSST Dark Energy Science Collaboration)

1Mullard Space Science Laboratory, University College London, RH5 6NT, UK.

2Maxwell Institute for Mathematical Sciences, Heriot-Watt University, Edinburgh EH14 4AS, United Kingdom E-mail: [email protected]

(Accepted XXX. Received YYY; in original form ZZZ)

Abstract

Until recently mass-mapping techniques for weak gravitational lensing convergence reconstruction have lacked a principled statistical framework upon which to quantify reconstruction uncertainties, without making strong assumptions of Gaussianity. In previous work we presented a sparse hierarchical Bayesian formalism for convergence reconstruction that addresses this shortcoming. Here, we draw on the concept of local credible intervals (cf. Bayesian error bars) as an extension of the uncertainty quantification techniques previously detailed. These uncertainty quantification techniques are benchmarked against those recovered via Px-MALA – a state of the art proximal Markov Chain Monte Carlo (MCMC) algorithm. We find that typically our recovered uncertainties are everywhere conservative (never underestimate the uncertainty, yet the approximation error is bounded above), of similar magnitude and highly correlated with those recovered via Px-MALA. Moreover, we demonstrate an increase in computational efficiency of $\mathcal{O}(10^{6})$ when using our sparse Bayesian approach over MCMC techniques. This computational saving is critical for the application of Bayesian uncertainty quantification to large-scale stage @slowromancapiv@ surveys such as LSST and Euclid.

keywords:

gravitational lensing: weak – Methods: statistical – Methods: data analysis – techniques: image processing

††pubyear: 2018††pagerange: Sparse Bayesian mass-mapping with uncertainties: local credible intervals–A

1 Introduction

As photons from distant sources (galaxies) travel through spacetime to us here and now their trajectories are perturbed by local mass over and under-densities, causing the observed shapes of structures to be warped, or gravitationally lensed. This cosmological effect is sensitive to all matter (both visible and invisible), and so provides a natural cosmological probe of dark matter.

The gravitational lensing effect has (at first order) two distinct effects: distant galaxies are magnified by a convergence field $\kappa$ ; and the third-flattening (ellipticity) is perturbed from an underlying intrinsic value by a shear field $\gamma$ . A wide range of cosmology can be extracted from just the shear field (Taylor et al., 2018; Alsing et al., 2016), though increasingly higher order statistics (Munshi et al., 2008; Heavens, 2009; Munshi & Coles, 2017; Coles & Chiang, 2000) are being computed on convergence maps directly.

As a result of the mass-sheet degeneracy (an a priori degeneracy of the intrinsic brightness of galaxies, see Bartelmann & Schneider, 2001) the convergence field cannot be observed directly. Instead measurements of the shear field $\gamma$ must be taken and inverted through some mapping to create an estimator for $\kappa$ . Typically, these inverse problems are ill-posed (often seriously) and so creating unbiased estimators for the convergence $\kappa$ can prove difficult.

Many convergence inversion techniques have been considered (e.g. VanderPlas et al., 2011; Lanusse et al., 2016; Wallis et al., 2017; Jeffrey et al., 2018; Chang et al., 2018) though the simplest, most direct method in the planar setting is that of Kaiser-Squires (KS) inversion (Kaiser & Squires, 1993). Though these methods often produce reliable estimates of $\kappa$ , they all either lack principled statistical uncertainties on their reconstructions or make strong assumptions of Gaussianity (which heavily degrades the quality of non-Gaussian information in particular).

For example, Wiener filtering (see e.g Horowitz et al., 2018) directly adopts Gaussian priors which are more explicit assumptions of Gaussianity. Other approaches, such as the KS method recover a noisy convergence estimate which is post-processed via convolution with a Gaussian kernel, which promotes Gaussianity.

On large scales the lensing information is primarly Gaussian in nature, though on smaller scales (at higher resolutions) there becomes a non-neglibile non-Gaussian contribution which encodes information about baryonic interactions and clustering amongst other non-linear effects. Analysis of such effects is expected (Munshi et al., 2008) to provide competitive and more importantly complementary constraints on cosmological parameters – in particular parameters related closely to dark matter such as $\sigma_{8}$ and $\Omega_{M}$ . Consequently, mapping techniques which preserve the non-Gaussian information content are a crucial step forward for dark matter analysis via weak gravitational lensing.

In previous work (Price et al., 2018) we presented a new sparse hierarchical Bayesian formalism for reconstructing the convergence field. This not only regularizes the ill-posed inverse problem but allows us to explore the Bayesian posterior in order to recover principled uncertainties on our reconstruction. It is important to note here that this mathematical framework is entirely general and can be applied for any posterior which belongs to the set of log-concave functions – of which both sparsity enforcing Laplace type priors and standard Gaussian priors are members.

Often hierarchical Bayesian inference problems are solved by Markov Chain Monte Carlo (MCMC) techniques (see e.g. Trotta, 2017), which explicitly return a large number of samples from the full posterior distribution – from which one can construct true Bayesian uncertainties. Samples of the posterior via MCMC algorithms construct theoretically optimal estimates of the posterior (in the limit of a large number of samples), but in practice can be extremely computationally taxing to recover fully.

In fact, when the dimensionality becomes large these methods become infeasible – often referred to as the curse of dimensionality. In the context of lensing inverse problems each pixel constitutes a dimension, and so for a resolution of $1024\times 1024$ (which is typical) the dimension of the problem is $\mathcal{O}(10^{6})$ .

Recent advancements in probability density theory (Robert, 2001) allow conservative approximations of Bayesian credible regions of the posterior from knowledge of the MAP solution alone (Pereyra, 2017). The sparse Bayesian method presented in previous work (see Price et al., 2018) recasts the maximization of the posterior distribution as a convex optimization problem from which the maximum a posteriori (MAP) solution can be rapidly computed. Uncertainty quantification is then conducted utilizing the aforementioned approximate credible regions of the posterior. In Price et al. (2018) hypothesis testing (determining the statistical significance of a feature of the recovered convergene map) was introduced to the weak lensing setting as a form of uncertainty quantification.

In this article we introduce a further uncertainty quantification technique called local credible intervals (cf. pixel-level error bars). Both hypothesis testing and local credible intervals were previously developed and applied to the radio interferometric setting (Cai et al., 2017a, b). We also remark that there are alternative ways of testing image structures (Repetti et al., 2018). This paper serves as a benchmark comparison of our sparse hierarchical Bayesian formalism (see Price et al., 2018) to a bespoke MCMC algorithm, Px-MALA (Cai et al., 2017a, b; Durmus et al., 2016; Pereyra, 2016). Px-MALA utilizes Moreau-Yoshida envelopes and proximity operators (tools from convex analyses) to support non-differentiable terms in the prior or likelihood, making it one of the only somewhat efficient ways to support non-smooth sparsity-promoting priors (on which our sparse Bayesian mass-mapping framework is based) in high dimensional settings.

The remainder of this article is structured as follows. We begin with section 2 in which we review our sparse hierarchical Bayesian models for mass-mapping and present a brief overview of the Px-MALA MCMC algorithm. We then cover the relevant mathematical background of approximate Bayesian uncertainty quantification in section 3 before introducing the concept of local credible intervals – an additional form of uncertainty quantification. In section 4, we conduct a series of mock scenarios to compare the uncertainties recovered by our maximum a posteriori (MAP) approach, and the full MCMC (Px-MALA) treatment. Finally we draw conclusions and discuss future work in section 5.

Section 2 relies on a strong understanding of Bayesian inference and MCMC techniques along with a moderate understanding of proximal calculus and compressed sensing. As such, for the reader interested only in the application and benchmarking section 4 onwards is relevant content.

2 Hierarchical Bayesian Inference for Mass-mapping

Hierarchical Bayesian models provide a flexible, well defined approach for dealing with uncertainties in a variety of problems. For an overview of Bayesian hierarchical modeling and MCMC techniques in the context of astrophysics we refer the reader to Trotta (2017).

We begin by presenting an overview of the sparse hierarchical Bayesian approach developed in previous work (see Price et al., 2018), where we also review the weak lensing planar forward model. Following this we make the MAP optimization problem explicit. We then review the Bayesian parameter inference hierarchy adopted in our sparse Bayesian mass-mapping algorithm (Price et al., 2018). Finally we provide a short introduction to the Px-MALA and MYULA proximal Markov chain Monte-Carlo algorithms (Durmus et al., 2016; Pereyra, 2016).

2.1 Bayesian Inference

Mathematically, let us begin by considering the posterior distribution which by Bayes’ Theorem is given by

[TABLE]

Bayes’ theorem relates the posterior distribution $p(\kappa|\gamma)$ to the product of some likelihood function $p(\gamma|\kappa)$ and some prior $p(\kappa)$ . It is important to note here that a model is implicit which collectively defines the noise and the proposed relationship between observations $\gamma$ and inferences $\kappa$ – specifically this term characterizes the noise model and the assumed mapping $\{\kappa\mapsto\gamma\}$ . Note that the denominator in equation (1) is the model’s marginal likelihood which is unrelated to $\kappa$ .

Suppose the discretized complex shear field $\gamma\in\mathbb{C}^{M}$ and the discretized complex convergence field $\kappa\in\mathbb{C}^{N}$ – where $M$ represents the number of binned shear measurements and $N$ represents the dimensionality of the convergence estimator – are related by a measurement operator $\bm{\Phi}\in\mathbb{C}^{M\times N}$ defined such that

[TABLE]

Further, suppose a contaminating noise $n$ is present. Measurements of $\gamma$ are produced via

[TABLE]

For the case considered within this paper, we take $n\sim\mathcal{N}(0,\sigma_{n}^{2})\in\mathbb{C}^{M}$ – i.e. i.i.d. (independent and identically distributed) additive Gaussian noise. For the purpose of this paper we consider the simplest planar mapping,

[TABLE]

Here, $\bm{\mathsf{F}}$ ( $\bm{\mathsf{F}}^{-1}$ ) is the forward (inverse) discrete fast Fourier transforms and $\bm{\mathsf{D}}$ is the weak lensing planar forward-model in Fourier space (e.g. Kaiser & Squires, 1993),

[TABLE]

The measurement operator $\bm{\Phi}$ has also been extended to super-resolution image recovery (Price et al., 2018), but that is beyond the scope of this paper.

In the majority of weak lensing surveys $M<N$ (i.e. the shear field is a discrete under-sampling of the underlying convergence field) and so inverting the forward-model is typically ill-posed (often seriously). To regularize ill-posed inverse problems a term encoding prior information is introduced – this is referred to either as the prior or regularization term.

We choose a prior which reflects the quasi-philosophical notion of Occam’s Razor – a prior which says if two solutions are equally viable, the one which makes the fewest assumptions (the fewest active variables – non-zero coefficients in a sparse domain) is more likely to be true. Mathematically, this is equivalent to imposing sparsity that minimizes the number of non-zero coefficients in a sparse representation (dictionary).

One could select any sparsifying domain, though a natural choice for most physical systems are wavelets. We choose to use wavelets as our sparsifying dictionary in this paper and in previous work.

The natural sparsity-promoting prior is the $\ell_{0}$ -norm $\lVert.\rVert_{0}$ , often referred to as the Hamming distance – i.e. the total number of non-zero coefficients of a field. However, this function is non-differentiable and (perhaps more importantly) non-convex. As such it cannot exploit the computational advantages provided by conventional convex optimization techniques.

Researchers therefore often select the next most natural sparsity-promoting prior, the $\ell_{1}$ -norm $\lVert.\rVert_{1}$ , which is convex and can be shown to share the same MAP (maximum-a-posteriori) solution as if one were to use the $\ell_{0}$ -norm in certain cases (see e.g. Donoho, 2006; Candès & Wakin, 2008, on convex relaxation).

We now define the likelihood function (data fidelity term) as a multivariate Gaussian with diagonal covariance $\Sigma=\sigma_{n}^{2}\mathbb{I}$ such that,

[TABLE]

which (as in Price et al., 2018) is regularized by a non-differentiable Laplace-type sparsity-promoting wavelet prior

[TABLE]

where $\bm{\Psi}$ is an appropriately selected sparsifying dictionary (such as a wavelet dictionary) in which the signal is assumed to be sparse, and $\mu\in\mathbb{R}_{+}$ is a regularization parameter. Substituting $p(\gamma|\kappa)$ and $p(\kappa)$ into equation (1) yields

[TABLE]

Note that one can choose any convex log-priors e.g. an $\ell_{2}$ -norm prior from which one essentially recovers Weiner filtering (see Padmanabhan et al., 2003; Horowitz et al., 2018, for alternate iterative Weiner filtering approaches).

2.2 Sparse MAP estimator

Drawing conclusions directly from $p(\kappa|\gamma)$ can be difficult because of the high dimensionality involved, which will be detailed in the next section. As an alternative, Bayesian methods often derive solutions by computing estimators that summarize $p(\kappa|\gamma)$ , such as maximizing the probability of the recovered $\kappa$ conditional on the data $\gamma$ . Such a solution is referred to as the MAP solution. From the monotonicity of the logarithm function it is evident that,

[TABLE]

which is a convex minimization problem and can therefore be computed in a highly computationally efficient manner.

To solve the convex minimization problem given in equation (9) we implement an adapted forward-backward splitting algorithm (Combettes & Pesquet, 2009). A complete description of the steps adopted when solving this optimization problem, and the full details of the sparse hierarchical Bayesian formalism are outlined in previous work (Cai et al., 2017b; Price et al., 2018).

2.3 Sparse Dictionary and Regularization Parameter

Here we provide a concise overview of the parameter selection aspect of our sparse Bayesian mass-mapping algorithm which was developed and presented in previous work – for a complete description see Price et al. (2018); Pereyra et al. (2015).

The prior term in equation (9) promotes the a priori knowledge that the signal of interest $\kappa$ is likely to be sparse in a given dictionary $\bm{\Psi}$ . A function $f(x)$ is sparse in a given dictionary $\bm{\Psi}$ if the number of non-zero coefficients is small compared to the total size of the dictionary domain. Wavelets form a general set of naturally sparsifying dictionaries for a wide-range of physical problems – and have recently been shown to work well in the weak lensing setting (Jeffrey et al., 2018; Lanusse et al., 2016; Peel et al., 2017; Price et al., 2018). For the purpose of this paper we restrict ourselves to Daubechies 8 (DB8) wavelets (with 8 wavelet levels) for simplicity though a wide variety of wavelets could be considered (e.g. Carrillo et al., 2012; Starck et al., 2015; Pires et al., 2009). Note that the exact choice of wavelet representation (and prior) is independent from the results of this benchmarking paper, thus discussion of dictionary (and prior) optimality is not of primary concern.

An issue in these types of regularized optimization problems is the setting of regularization parameter $\mu$ - several approaches have been presented (Lanusse et al., 2016; Peel et al., 2017; Paykari et al., 2014; Jeffrey et al., 2018). For uncertainties on reconstructed $\kappa$ maps to be truly principled $\mu$ must be computed in a well defined, statistically principled way. In Price et al. (2018) a hierarchical Bayesian inference approach to compute the theoretically optimal $\mu$ was adopted, which we outline in appendix A

2.4 Proximal MCMC Sampling

Sampling a full posterior distribution is very challenging in high dimensional settings, particularly when the prior $p(\kappa)$ considered is non-differentiable – like the sparsity-promoting prior given in equation (7). In the following, we recall two proximal MCMC methods developed in Durmus et al. (2016); Pereyra (2016) – MYULA and Px-MALA – which can be applied to sample the full posterior density $p(\kappa|\gamma)$ for mass-mapping. After a set of samples has been obtained, various kinds of analysis can be performed, such as summary estimators of $\kappa$ , and a range of uncertainty quantification techniques, as presented in Cai et al. (2017a, b).

For a probability density $p\in\mathcal{C}^{1}$ with Lipschitz gradient, the Markov chain of the unadjusted Langevin algorithm (ULA) to generate a set of samples $\{{\bm{l}}^{(m)}\}_{\in\mathbb{C}^{N}}$ based on a forward Euler-Maruyama approximation with step-size $\delta>0$ has the form

[TABLE]

where ${\bm{w}}^{(m+1)}\sim{\cal N}(0,\mathbb{1}_{N})$ (an $N$ -sequence of standard Gaussian random variables).

However, the chain generated by ULA given above converges to ${p}$ with asymptotic bias. This kind of bias can be corrected at the expense of some additional estimation variance (Roberts & Tweedie, 1996) after involving a Metropolis-Hasting (MH) accept-reject step in ULA, which results in the MALA algorithm (Metropolis-adjusted Langevin Algorithm). However, the convergence of ULA and MALA is limited to a continuously differentiable $\log{p}$ with Lipschitz gradient, which prohibits their application to our focus on mass-mapping with non-differentiable sparsity-promoting prior in equation ( 7).

Proximal MCMC methods – such as MYULA and Px-MALA – can be used to address non-differentiable sparsity-promoting priors (Durmus et al., 2016; Pereyra, 2016). Without loss of generality, consider a log-concave posterior which is of the exponential family

[TABLE]

for lower semi-continuous convex and Lipschitz differentiable log-likelihood $g(x)\in\mathcal{C}^{1}$ and lower semi-continuous convex log-prior $f(x)\notin\mathcal{C}^{1}$ . It is worth noting that this is precisely the setting adopted within this paper, where from (8)

[TABLE]

To sample this posterior the gradient $\nabla\log p$ is required, however $f(x)$ is not Lipschitz differentiable. To account for the non-differentiability of $f(x)$ let us now define the smooth approximation $p_{\lambda}(\kappa|\gamma)\propto\exp{\{-f^{\lambda}(\kappa)-g(\kappa)\}}$ , where

[TABLE]

is the $\lambda$ -Moreau-Yosida envelope of $f$ , which can be made arbitrarily close to $f$ by letting $\lambda\rightarrow 0$ (see Parikh & Boyd 2014). Then we have $\underset{\lambda\rightarrow 0}{\lim}p_{\lambda}(\kappa|\gamma)=p(\kappa|\gamma)$ , and more importantly that, for any $\lambda>0$ , the total-variation distance between the distributions $p_{\lambda}$ and $p$ is bounded by $\|p_{\lambda}-p\|_{TV}\leq\lambda\mu N$ , providing an explicit bound on the estimation errors involved in using $p_{\lambda}$ instead of $p$ (see Durmus et al. 2016 for details). Also, the gradient $\nabla\log p_{\lambda}=-\nabla f^{\lambda}-\nabla g$ is always Lipschitz continuous, with $\nabla f^{\lambda}(\kappa)=\big{(}\kappa-{\rm prox}_{f}^{\lambda}(\kappa)\big{)}/\lambda$ , where ${\rm prox}_{f}^{\lambda}(\kappa)$ is the proximity operator of $f$ at $\kappa$ defined as

[TABLE]

Replacing $\nabla\log p$ by $\nabla\log p_{\lambda}$ in the Markov chain of ULA and MALA given in (10) yields,

[TABLE]

which is named the MYULA algorithm (Moreau-Yosida regularised ULA). The MYULA chain (15), with small $\lambda$ , efficiently delivers samples that are approximately distributed according to the posterior $p(\kappa|\gamma)$ . By analogy with the process used to obtain MALA from ULA, we create the Px-MALA (proximal MALA) after involving an MH (Metropolis-Hasting) accept-reject step in MYULA.

Essentially, the main difference of the two proximal MCMC methods (MYULA and Px-MALA) is that Px-MALA includes a Metropolis-Hastings step which is used to correct the bias present in MYULA. Therefore, Px-MALA can provide results with more accuracy, at the expense of a higher computational cost and slower convergence (Pereyra, 2016). Note, however, that these MCMC methods (as with any MCMC method) will suffer when scaling to high-dimensional data. Refer to e.g. Durmus et al. (2016); Pereyra (2016); Cai et al. (2017a) for more detailed description of the proximal MCMC methods.

In this article, akin to the experiments performed in Cai et al. (2017b), we use the proximal MCMC method Px-MALA as a benchmark in the subsequent numerical tests presented in this work.

3 Approximate Bayesian Uncertainty Quantification

Though MAP solutions are theoretically optimal (most probable, given the data) one is often interested in the posterior distribution about this MAP point estimate – a necessity if one wishes to be confident in one’s result. As described in section 2.4 we can recover this posterior distribution completely using proximal MCMC techniques such as Px-MALA. However, these approaches are highly computationally demanding. They are feasible in the planar setting at a resolution of $256\times 256$ , where computation is of $\mathcal{O}(30\;\text{hours})$ , but quickly become unrealistic for high resolutions.

More fundamentally, if we extend mass-mapping from the planar setting to the spherical setting (Wallis et al., 2017) the wavelet and measurement operators become more complex – fast Fourier transforms are replaced with full spherical harmonic transforms – and recovery of the posterior via MCMC techniques become highly computationally challenging at high resolutions.

In stark contrast to traditional MCMC techniques, recent advances in probability density theory have paved the way for efficient calculation of theoretically conservative approximate Bayesian credible regions of the posterior (Pereyra, 2017). This approach allows us to extract useful information from the posterior without explicitly having to sample the full posterior. Crucially, this approach is shown to be many orders of magnitude less computationally demanding than state-of-the-art MCMC methods (Cai et al., 2017a) and can be parallelized and distributed.

In the following section we formally define the concept of a Bayesian credible region of the posterior. We discuss limitations of computing these credible regions and highlight recently proposed approximations to Bayesian credible region. Finally we outline recently developed computationally efficient uncertainty quantification techniques which can easily scale to high-dimensional data. Specifically, we introduce the concept of local credible intervals (cf. pixel level error bars) presented first in Cai et al. (2017b) to the weak lensing setting.

3.1 Highest Posterior Density

A posterior credible region at $100(1-\alpha)\%$ confidence is a set $C_{\alpha}\in\mathbb{C}^{N}$ which satisfies

[TABLE]

Generally there are many regions which satisfy this constraint. The minimum volume, and thus decision-theoretical optimal (Robert, 2001), region is the highest posterior density (HPD) credible region, defined to be

[TABLE]

where $f(\kappa)$ is the prior and $g(\kappa)$ is the data fidelity (likelihood) term. In the above equation $\epsilon_{\alpha}$ is an isocontour (i.e. level-set) of the log-posterior set such that the integral constraint in equation (16) is satisfied. In practice the dimension $N$ of the problem is large and the calculation of the true HPD credible region is difficult to compute.

Recently a conservative approximation of $C_{\alpha}$ has been derived (Pereyra, 2017), which can be used to tightly constrain the HPD credible region without having to explicitly calculate the integral in equation (16):

[TABLE]

By construction this approximate credible-region is conservative, which is to say that $\{C_{\alpha}\subset C_{\alpha}^{\prime}\}$ . Importantly, this means that if a $\kappa$ map does not belong to $C_{\alpha}^{\prime}$ then it necessarily cannot belong to $C_{\alpha}$ . The approximate level-set threshold $\epsilon^{\prime}_{\alpha}$ at confidence $100(1-\alpha)\%$ is given by

[TABLE]

where we recall $N$ is the dimension of $\kappa$ . The constant $\tau_{\alpha}=\sqrt{16\log(3/\alpha)}$ quantifies the envelope required such that the HPD credible-region is a sub-set of the approximate HPD credible-region. There exists an upper bound on the error introduced through this approximation, which is given by

[TABLE]

where the factor $\eta_{\alpha}=\sqrt{16\log(3/\alpha)}+\sqrt{1/\alpha}$ . This approximation error scales at most linearly with $N$ . As will be shown in this paper this upper bound is typically extremely conservative in practice, and the error small.

We now introduce a recently proposed strategy for uncertainty quantification building on the concept of approximate HPD credible-regions. For further details on the strategy we recommend the reader see related work (Cai et al., 2017b).

3.2 Local Credible Intervals

Local credible intervals can be interpreted as error bars on individual pixels or super-pixel regions (collection of pixels) of a reconstructed $\kappa$ map. This concept can be applied to any method for which the HPD credible-region (and thus the approximate HPD credible-region) can be computed. Mathematically local credible intervals can be computed as follows (Cai et al., 2017b).

Select a partition of the $\kappa$ domain $\Omega=\cup_{i}\Omega_{i}$ such that super-pixels $\Omega_{i}$ (e.g. an $8\times 8$ block of pixels) are independent sub-sets of the $\kappa$ domain $\Omega_{i}\cap\Omega_{j}\>=\varnothing,\>\forall\>\{i\neq j\}$ . Clearly, provided the super-pixels $\Omega_{i}$ completely tessellate $\Omega$ they can be of arbitrary dimension. We define indexing notation on the super-pixels $\Omega_{i}$ via the index operator $\zeta_{\Omega_{i}}$ which satisfy analogous relations to the standard set indicator function given in equation (33) – i.e. $\zeta_{\Omega_{i}}=1$ if the pixel of the convervence map $\kappa$ belongs to $\Omega_{i}$ and [math] otherwise.

For a given super-pixel region $\Omega_{i}$ we quantify the uncertainty by finding the upper and lower bounds $\xi_{+,\Omega_{i}}$ , $\xi_{-,\Omega_{i}}$ respectively, which raise the objective function above the approximate level-set threshold $\epsilon^{\prime}_{\alpha}$ (or colloquially, ‘saturate the HPD credible region $C_{\alpha}^{\prime}$ ’). In a mathematical sense these bounds are defined by

[TABLE]

and

[TABLE]

where $\mathbf{\kappa}_{i,\xi}=\kappa^{\text{map}}(\mathbf{I}-\zeta_{\Omega_{i}})+\xi\zeta_{\Omega_{i}}$ is a surrogate solution where the super-pixel region has been replaced by a uniform intensity $\xi$ . We then construct the difference image $\sum_{i}(\xi_{+,\Omega_{i}}-\xi_{-,\Omega_{i}})$ which represents the length of the local credible intervals (cf. error bars) on given super-pixel regions at a confidence of $100(1-\alpha)\%$ .

In this paper we locate $\xi_{\pm}$ iteratively via bisection, though faster converging algorithms could be used to further increase computational efficiency. A schematic diagram for constructing local credible intervals is found in Figure 1. Conceptually, this is finding the maximum and minimum constant values which a super-pixel region could take, at $100(1-\alpha)\%$ confidence – which is effectively Bayesian error bars on the convergence map.

In plain english, starting from the MAP convergence solution $\kappa^{\text{map}}$ – at which all pixels are in positions which minimize the objective function – we then select a sub-set of the pixels (e.g. an $8\times 8$ block of pixels). We start by averaging the pixels in the block which is selected. We then set the pixels within this block to the average value. Following this we iteratively raise/lower the now uniform value of the pixels within this block whilst keeping the rest of the image fixed. After each iteration we check if the surrogate solution ( $\kappa^{\text{map}}$ with the block of interest replaced by some constant value) is an acceptable solution (i.e. the objective function is below the threshold $\epsilon_{\alpha}^{\prime}$ ). We find the values (upper and lower bounds) at which objective function is equal to the threshold $\epsilon_{\alpha}^{\prime}$ . We then take the difference between these bounds, which is the local credible interval for a given ‘block of interest’ (super-pixel region).

4 Evaluation on Simulations

For computing Bayesian inference problems one would ideally adopt an MCMC approach as they are (assuming convergence) guaranteed to produce optimal results, however these approaches are computationally demanding and can often be computationally infeasible. Therefore it is beneficial to adopt approximate but significantly computationally cheaper methods, such as the MAP estimation approach reviewed in this article – first presented in Price et al. (2018).

However, the approximation error introduced through these approximate methods must be ascertained. Therefore, in this section we benchmark the uncertainties reconstructed via our MAP algorithm to those recovered by the state-of-the-art proximal MCMC algorithm, Px-MALA (Durmus et al., 2016; Pereyra, 2016). Additionally we compare the computational efficiencies of both approaches, highlighting the computational advantages provided by approximate methods.

For simplicity and brevity throughout we will refer to any uncertainties recovered via our aforementioned maximum a posteriori reconstruction method as ‘MAP uncertainties’. Additionally we will refer to the maximum a posteriori reconstruction method discussed throughout this paper as ‘MAP algorithm‘.

4.1 Datasets

We select four test convergence fields: two large scale Buzzard N-body simulation (DeRose et al., 2018; Wechsler, 2018) planar patches selected at random; and two of the largest dark matter halos from the Bolshoi N-body simulation (Klypin et al., 2011). This selection is chosen such as to provide illustrative examples of the uncertainty quantification techniques in both cluster and wider-field weak lensing settings.

4.1.1 Bolshoi N-body

The Bolshoi cluster convergence maps used were produced from 2 of the largest halos in the Bolshoi N-body simulation. These cluster were selected for their large total mass and the complexity of their substructure, as can be seen in Figure 2.

Raw particle data was extracted from the Bolshoi simulation using CosmoSim111https://www.cosmosim.org, and was then gridded into $1024\times 1024$ images. These images inherently contain shot-noise and so were passed through a multi-scale Poisson denoising algorithm before being re-gridded to $256\times 256$ .

The denoising algorithm consisted of a forward Anscombe transform (to Gaussianise the noise), several TV-norm (total-variation) denoising optimizations of different scale, before finally inverse Anscombe transforming. Finally, the images were re-scaled onto $[0,1]$ – a similar denoising approach for Bolshoi N-body simulations was adopted in related articles Lanusse et al. (2016).

4.1.2 Buzzard N-body

The Buzzard v-1.6 shear catalogs are extracted by ray-tracing from a full end-to-end N-body simulation. The origin for tracing is positioned in the corner of the simulation box and so the catalog has $25\%$ sky coverage. Access to the Buzzard simulation catalogs was provided by the LSST-DESC collaboration222http://lsst-desc.org.

In the context of this paper we restrict ourselves to working on the plane, and as such we extracted smaller planar patches. To do so we first project the shear catalog into a coarse HEALPix333http://healpix.sourceforge.net/documentation.php(Gorski et al., 2005) griding (with $N_{\text{side}}$ of 16). Inside each HEALPix pixel we tessellate the largest possible square region, onto which we rotate and project the shear catalog. Here HEALPix pixelisation is solely used for its equal area pixel properties.

After following the above procedure, the Buzzard v-1.6 shear catalog reduces to $\sim 3\times 10^{3}$ planar patches of angular size $\sim 1.2\deg^{2}$ , with $\sim 4\times 10^{6}$ galaxies per patch. In previous work (Price et al., 2018) we utilized 60 of these realisations, but for the purpose of this paper we select at random two planar regions to study, which we grid at a $256\times 256$ resolution. These plots can be seen in Figure 3.

4.2 Methodology

To draw comparisons between our MAP uncertainties and those recovered via Px-MALA we conduct the following set of tests on the aforementioned datasets (see section 4.1).

Initially we transform the ground truth convergence $\kappa^{\text{in}}$ into a clean shear field $\gamma^{\text{in}}$ by

[TABLE]

This clean set of shear measurements is then contaminated with a noise term $n$ to produce mock noisy observations $\gamma$ such that

[TABLE]

For simplicity we choose the noise to be zero mean i.i.d. Gaussian noise of variance $\sigma_{n}^{2}$ – i.e. $n\sim\mathcal{N}(0,\sigma_{n}^{2})$ . In this setting $\sigma_{n}$ is calculated such that the signal to noise ratio (SNR) is 20 dB (decibels) where

[TABLE]

Throughout this uncertainty benchmarking we use a fiducial noise level of 20 dB. For further details on how a noise level in dB maps to quantities such as galaxy number density and pixel size see Price et al. (2018). In particular, we draw the reader’s attention to Price et al. (Table C1 2018). The noise level of 20 dB considered here is somewhat optimistic (corresponding to between 30 and 100 galaxies per square arcmin for a band-limit of $\sim$ 400), which is appropriate for the purposes of benchmarking against MCMC simulations, which is the focus of the current article (less optimistic simulations would simply increase the absolute level of the quantified uncertainties but not their relative level).

We then apply our entire reconstruction pipeline (Price et al., 2018), as briefly outlined in section 2.1, to recover $\kappa^{\text{map}}$ , along with the objective function – with regularization parameter $\mu$ and noise variance $\sigma_{n}^{2}$ . Using these quantities, and the Bayesian framework outlined in sections 2 and 3, we conduct uncertainty quantifications on $\kappa^{\text{map}}$ .

To benchmark the MAP reconstructed uncertainties we first construct an array of local credible interval maps described in section 3 for super-pixel regions of sizes $[4,8,16]$ at $99\%$ confidence. These local credible interval maps are then compared to those recovered from the full MCMC analysis of the posterior.

We adopt two basic statistical measures to compare each set of recovered local credible interval maps: the Pearson correlation coefficient $r$ ; and the recovered SNR. The Pearson correlation coefficient between our MAP local credible interval map $\xi^{\text{map}}\in\mathbb{R}^{N^{\prime}}$ and the Px-MALA local credible interval map $\xi^{\text{px}}\in\mathbb{R}^{N^{\prime}}$ , where $N^{\prime}$ is the dimension of the super-pixel space, is defined to be

[TABLE]

where $\bar{x}=\langle x\rangle$ . The correlation coefficient $r\in[-1,1]$ quantifies the structural similarity between two datasets: 1 indicates maximally positive correlation, 0 indicates no correlation, and -1 indicates maximally negative correlation.

The second of our two statistics is the recovered SNR which is calculated between $\xi^{\text{map}}$ and $\xi^{\text{px}}$ to be

[TABLE]

where $\xi^{\text{px}}$ recovered by Px-MALA is assumed to represent the ground truth Bayesian local credible interval, and $\lVert.\rVert_{2}$ is the $\ell_{2}$ -norm. The SNR is a measure of the absolute similarity of two maps – in this context, rather than the structural correlation which is encoded into $r$ , the SNR is a proxy measure of the relative magnitudes of the two datasets. Additionally, we compute the root mean squared percent error (RMSE),

[TABLE]

Conceptually, the SNR roughly compares the absolute magnitudes of recovered local credible intervals and the Pearson correlation coefficient gives a rough measure of how geometrically similar the local credible intervals are. In this sense the closer $r$ is to 1 the more similar the recovered local credible intervals are, and the higher the SNR the smaller the approximation error given by equation (20). Thus, a positive result is quantified by both large correlation and large SNR.

4.3 Results

As can be seen in Figures 4 and 5 the local credible intervals recovered through our sparse hierarchical Bayesian formalism are at all times larger than those recovered via Px-MALA – confirming that the uncertainties are conservative, as proposed in section 3. Moreover, a strong correlation between the reconstructions can be seen.

The largest correlation coefficients $r$ are observed for super-pixel regions of dimension $16\times 16$ in all cases ( $\langle r\rangle\approx 0.9$ ), peaking as high as 0.98 for the Buzzard 1 extraction – which constitutes a near maximal correlation, and thus an outstanding topological match between the two recovered local credible intervals.

Additionally, in the majority of cases the recovered SNR is $\geq 10$ dB – in some situations rising as high as $\approx 13$ dB (corresponding to $\approx 20\%$ RMSE percent error) – which indicates that the recovered MAP uncertainties are close in magnitude to those recovered via Px-MALA.

However, for super-pixels with dimension $4\times 4$ the structural correlation between $\xi^{\text{map}}$ and $\xi^{\text{px}}$ becomes small – in one case becoming marginally negatively correlated. This is likely to be a direct result of the error given by equation (20) inherited from the definition of the approximate HPD credible region – as this approximation has the side-effect of smoothing the posterior hyper-volume, and for small super-pixels the hyper-volume is typically not smooth, thus the correlation coefficient $r$ decreases.

We conducted additional tests for large $32\times 32$ dimension super-pixels, which revealed a second feature of note. For particularly large super-pixel regions ( $32\times 32$ or larger) the SNR becomes small for both Buzzard maps. This is a result of the assumption that within a super-pixel there exists a stable mean which is roughly uniform across the super-pixel. Clearly, for buzzard type data, on large scales this breaks down and so the recovered local credible intervals deviate from those recovered via Px-MALA. It is important to stress this is a breakdown of the assumptions made when constructing local credible intervals and not an error of the approximate HPD credible region.

The numerical results are summarised in Table 1. Typically, structures of interest in recovered convergence maps cover super-pixel regions of roughly $8\times 8$ to $16\times 16$ , and so for most realistic applications our MAP uncertainties match very well with those recovered through Px-MALA. In most situations weak lensing data is gridded such that it best represents the features of interest, and so structures of interest (by construction) typically fall within $8\times 8$ to $16\times 16$ dimension super-pixel regions for $256\times 256$ gridded images – for higher resolution images the structures of interest, and corresponding optimal super-pixels will follow a similar ratio.

Overall, we find a very close relation between the local credible intervals recovered through our MAP algorithm with those recovered via Px-MALA – a state-of-the-art MCMC algorithm. We find that MAP and Px-MALA local credible intervals are typically strongly topologically correlated (pearson correlation coefficient $\approx 0.9$ ) in addition to being physically tight (RMSE error of $\approx 20-30\%$ ). Moreover, we find that the MAP local credible intervals are, everywhere, larger than the Px-MALA local credible intervals, corroborating the assertion that the approximate HPD level-set threshold $\epsilon_{\alpha}^{\prime}$ is in fact conservative.

We now compare the computational efficiency of our sparse Bayesian reconstruction algorithm against Px-MALA. It is worth noting that all Px-MALA computation was done on a high performance workstation (with 24 CPU cores and 256Gb of memory), whereas all MAP reconstructions were done on a standard 2016 MacBook Air. The computation time for MAP estimation is found to be $\mathcal{O}$ (seconds) whereas the computation time for Px-MALA is found to be $\mathcal{O}$ (days). Specifically, we find the MAP reconstruction algorithm is of $\mathcal{O}(10^{6})$ (typically $\geq 8\times 10^{5}$ ) times faster than the state-of-the-art Px-MALA MCMC algorithm. Moreover, the MAP reconstruction algorithm supports algorithmic structures that can be highly parallelized and distributed.

5 Conclusions

In this article we introduce the concept of local credible intervals (cf. pixel-level error bars) – developed in previous work and applied in the radio-interferometric setting – to the weak lensing setting as an additional form of uncertainty quantification. Utilizing local credible intervals we validate the sparse hierarchical Bayesian mass-mapping formalism presented in previous work (Price et al., 2018). Specifically we compare the local credible intervals recovered via the MAP formalism and those recovered via a complete MCMC analysis – from which the true posterior is effectively recovered.

To compute the asymptotically exact posterior we utilize Px-MALA – a state-of-the-art proximal MCMC algorithm. Using the local credible intervals; we benchmark the MAP uncertainty reconstructions against Px-MALA.

Quantitatively, we compute the Pearson correlation coefficient ( $r$ , as a measure of the correlation between hyper-volume topologies), recovered signal to noise ratio and the root mean squared percentage error (SNR and RMSE, both as measures of how tightly constrained is the absolute error).

We find that for a range of super-pixel dimensions the MAP and Px-MALA uncertainties are strongly topologically correlated ( $r\geq 0.9$ ). Moreover, we find the RMSE to typically be $\sim 20-30\%$ which is tightly constrained when one considers this is a conservative approximation along each of at least $\mathcal{O}(10^{3})$ dimensions.

Additionally we compare the computational efficiency of Px-MALA and our MAP approach. In a $256\times 256$ setting, the computation time of the MAP approach was $\mathcal{O}$ (seconds) whereas the compuattion time for Px-MALA was $\mathcal{O}$ (days). Overall, the MAP approach is shown to be $\mathcal{O}(10^{6})$ times faster than the state-of-the-art Px-MALA algorithm.

A natural progression is to extend the planar sparse Bayesian algorithm to the sphere, which will be the aim of upcoming work – a necessity when dealing with wide-field stage @slowromancapiv@ surveys such as LSST444https://www.lsst.org and EUCLID555http://euclid-ec.org. Additionally, we will expand the set of uncertainty quantification techniques to help propagate principled Bayesian uncertainties into the set of higher-order statistics typically computed on the convergence field.

Acknowledgements

Author contributions are summarised as follows. MAP: methodology, data curation, investigation, software, visualisation, writing - original draft; XC: methodology, investigation, software, writing - review & editing; JDM: conceptualisation, methodology, project administration, supervision, writing - review & editing; MP: methodology, software, writing - review & editing; TDK: methodology, supervision, writing - review & editing.

This paper has undergone internal review in the LSST Dark Energy Science Collaboration. The internal reviewers were Chihway Chang, Tim Eifler, and Francois Lanusse. The author thank the development teams of SOPT. MAP is supported by the Science and Technology Facilities Council (STFC). TDK is supported by a Royal Society University Research Fellowship (URF). This work was also supported by the Engineering and Physical Sciences Research Council (EPSRC) through grant EP/M011089/1 and by the Leverhulme Trust. The DESC acknowledges ongoing support from the Institut National de Physique Nucléaire et de Physique des Particules in France; the Science & Technology Facilities Council in the United Kingdom; and the Department of Energy, the National Science Foundation, and the LSST Corporation in the United States. DESC uses resources of the IN2P3 Computing Center (CC-IN2P3–Lyon/Villeurbanne - France) funded by the Centre National de la Recherche Scientifique; the National Energy Research Scientific Computing Center, a DOE Office of Science User Facility supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231; STFC DiRAC HPC Facilities, funded by UK BIS National E-infrastructure capital grants; and the UK particle physics grid, supported by the GridPP Collaboration. This work was performed in part under DOE Contract DE-AC02-76SF00515.

Appendix A Regularization Marginalization

A prior $f(\kappa)$ is $k$ -homogeneous if $\exists\;k\in\mathbb{R}_{+}$ such that

[TABLE]

As all norms, composite norms and compositions of norms and linear operators (Pereyra et al., 2015) have homogeneity of 1, $k$ in our setting is set to 1. If we wish to infer $\kappa$ without a priori knowledge of $\mu$ (the regularization parameter) then we calculate the normalization factor of $p(\kappa|\mu)$ ,

[TABLE]

For the vast majority of cases of interest, calculating $C(\mu)$ is not feasible, due to the large dimensionality of the integral. However, it was recently shown (Pereyra et al., 2015) that if the prior term $f(\kappa)$ is $k$ -homogeneous then

[TABLE]

A gamma-type hyper-prior is then selected (a typical choice for scale parameters) on $\mu$ such that

[TABLE]

where the hyper-parameters $(\alpha,\beta)$ are very weakly dependent and can be set to 1 (as in Pereyra et al., 2015) and $\mathbb{I}_{C_{\alpha}}$ is an indicator function defined by

[TABLE]

Now construct a joint Bayesian inference problem of $p(\kappa,\mu|\gamma)$ with MAP estimator $(\kappa^{\text{map}},\mu^{\text{map}})\in\mathbb{C}^{N}\times\mathbb{R}_{+}$ . By definition, at this MAP estimator

[TABLE]

where $\mathbf{0}_{i}$ is the $i$ -dimensional null vector. This in turn implies both that

[TABLE]

from which equation (9) follows naturally, and

[TABLE]

Using equations (31, 32, 36) it can be shown (Pereyra et al., 2015) that

[TABLE]

Hereafter we drop the map superscript on $\mu^{\text{map}}$ for simplicity. In order to compute the MAP $\mu$ preliminary iterations are performed as follows:

[TABLE]

where $\alpha$ and $\beta$ are (weakly dependent) hyper-parameters from a gamma-type hyper-prior, $N$ is the dimension of the reconstructed space, and the sufficient statistic $f(\kappa)$ is $k$ -homogeneous. Typically the MAP solution of $\mu$ converges within $\sim 5-10$ iterations, after which $\mu$ is fixed and the optimization in equation (9) is computed.

Bibliography39

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Alsing et al. (2016) Alsing J., Heavens A., Jaffe A. H., Kiessling A., Wandelt B., Hoffmann T., 2016, MNRAS , 455, 4452 · doi ↗
2Bartelmann & Schneider (2001) Bartelmann M., Schneider P., 2001, Phys. Rep. , 340, 291 · doi ↗
3Cai et al. (2017 a) Cai X., Pereyra M., Mc Ewen J. D., MNRAS in press, 2017 a, preprint, ( ar Xiv:1711.04818 )
4Cai et al. (2017 b) Cai X., Pereyra M., Mc Ewen J. D., MNRAS in press, 2017 b, preprint, ( ar Xiv:1711.04819 )
5Candès & Wakin (2008) Candès E. J., Wakin M. B., 2008, IEEE signal processing magazine, 25, 21
6Carrillo et al. (2012) Carrillo R. E., Mc Ewen J. D., Wiaux Y., 2012, MNRAS , 426, 1223 · doi ↗
7Chang et al. (2018) Chang C., et al., 2018, MNRAS , 475, 3165 · doi ↗
8Coles & Chiang (2000) Coles P., Chiang L.-Y., 2000, Nature , 406, 376 · doi ↗