Debiasing inference with approximate covariance matrices and other unidentified biases
Elena Sellentin, Jean-Luc Starck

TL;DR
The paper introduces a non-parametric, objective Bayesian method to detect and correct biases in posterior inferences, especially those caused by approximate covariance matrices, demonstrated on Euclid-like survey data.
Contribution
It presents a novel bias detection and correction technique that assesses posterior coverage and applies debiasing, improving inference reliability in cosmological analyses.
Findings
Approximate covariance matrices bias physical constraints.
Debiasing reduces bias impact on parameter estimation.
Method improves precision in dark energy parameters.
Abstract
When a posterior peaks in unexpected regions of parameter space, new physics has either been discovered, or a bias has not been identified yet. To tell these two cases apart is of paramount importance. We therefore present a method to indicate and mitigate unrecognized biases: Our method runs any pipeline with possibly unknown biases on both simulations and real data. It computes the coverage probability of posteriors, which measures whether posterior volume is a faithful representation of probability or not. If found to be necessary, the posterior is then corrected. This is a non-parametric debiasing procedure which complies with objective Bayesian inference. We use the method to debias inference with approximate covariance matrices and redshift uncertainties. We demonstrate why approximate covariance matrices bias physical constraints, and how this bias can be mitigated. We show that…
| # | angle (arcmin) | # | angle (arcmin) | ||
|---|---|---|---|---|---|
| 66 | 0.713 | 92 | 0.713 | ||
| 40 | 0.713 | 53 | 0.713 | ||
| 105 | 0.713 | 79 | 0.713 | ||
| 1 | 0.713 | 14 | 0.713 | ||
| 42 | 2.956 | 54 | 1.452 | ||
| 67 | 1.452 | 80 | 1.452 | ||
| 106 | 1.452 | 107 | 2.956 | ||
| 119 | 1.452 | 120 | 2.956 | ||
| 95 | 6.017 | 16 | 2.956 | ||
| 28 | 1.452 | 41 | 1.452 |
| N | probability | probability | probability | FoM | |
|---|---|---|---|---|---|
| 68% volume cont. | 90.0% volume cont. | 95.0% volume cont. | 0% | ||
| 625 | 25 | 70% volume cont. | 91.5% volume cont. | 96% volume cont. | 6% |
| 400 | 20 | 71% volume cont. | 92.0% volume cont. | 96% volume cont. | 8% |
| 225 | 15 | 72% volume cont. | 92.5% volume cont. | 96.5% volume cont. | 10% |
| 100 | 10 | 73% volume cont. | 93% volume cont. | 97% volume cont. | 12% |
| 49 | 7 | 75% volume cont. | 94% volume cont. | 97.5% volume cont. | 17% |
| 25 | 5 | 77% volume cont. | 95% volume cont. | 98% volume cont. | 22% |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Debiasing inference with approximate covariance matrices and other unidentified biases
Elena Sellentin
Jean-Luc Starck
Abstract
When a posterior peaks in unexpected regions of parameter space, new physics has either been discovered, or a bias has not been identified yet. To tell these two cases apart is of paramount importance. We therefore present a method to indicate and mitigate unrecognized biases: Our method runs any pipeline with possibly unknown biases on both simulations and real data. It computes the coverage probability of posteriors, which measures whether posterior volume is a faithful representation of probability or not. If found to be necessary, the posterior is then corrected. This is a non-parametric debiasing procedure which complies with objective Bayesian inference.
We use the method to debias inference with approximate covariance matrices and redshift uncertainties. We demonstrate why approximate covariance matrices bias physical constraints, and how this bias can be mitigated. We show that for a Euclid-like survey, if a traditional likelihood exists, then 25 end-to-end simulations suffice to guarantee that the figure of merit deteriorates maximally by 22 percent, or by 10 percent for 225 simulations. Thus, even a pessimistic analysis of Euclid-like data will still constitute an 25-fold increase in precision on the dark energy parameters in comparison to the state of the art (2018) set by KiDS and DES. We provide a public code of our method.
1 Introduction: unrecognized biases or new physics?
The hardest mistakes to correct for, are those which remained unnoticed, or for which no solution exists yet. Contemporary cosmology actively tackles biases from covariance matrices [1, 2, 3, 4], likelihoods [5, 6], lacking spectroscopic data for redshifts [7, 8], and [9] lists a comprehensive review of many more difficulties in leading weak lensing [10] data analyses.
Known and unknown biases propagate into cosmological parameter constraints, where they cause shifts of the posterior. In the absence of any biases, a posterior peaking in unexpected regions of parameter space must however be interpreted as a sign of new physics, and it is therefore of utmost importance to tell unrecognized biases and new physics apart. Furthermore, this distinction needs to be convincing beyond the boundaries of cosmology, i.e. also be convincing for neighbouring fields such as particle physics.
We therefore here provide a method which safeguards cosmological parameter constraints against recognized or unrecognized biases.
Based on a joint analysis of simulations and the real data with a likelihood, the method leads to unbiased credibility contours for the physical parameters. The method is non-Bayesian (but compatible with Bayesian inference) and therefore applies also when there is no error model available, which a Bayesian mitigation method would require. The thus gained credibility contours have a precise mathematical meaning, namely that of correct ‘coverage probability’ (defined in section 2). Coverage probabilities of Bayesian posteriors objectively measure differences between frequentist and Bayesian parameter constraints. They thereby measure how much the inferred physics depends on our assumptions when analyzing the data, rather than on information contained in the data. Accordingly, reporting the coverage also measures how much (frequentist) particle physicists, and (Bayesian) cosmologists could maximally disagree, given the same data set.
We develop our method in section 2. The method is general, but was developed to address currently outstanding problems of cosmic shear. For example, [1, 2] derive the to-date only known completely bias-free likelihood for estimated covariance matrices. In [11], it was then shown that extra-correlations exist between weak lensing data points, which cannot be captured by any covariance matrix, but affect the inference. In [5], these extra-correlations were studied in detail, showing that the actual weak lensing likelihood must be skewed, and that this skewness translates into parameter biases up to 10 percent of the standard deviation, depending on how the weak lensing data are binned in angular ranges and redshift bins.
A recurrent theme in these analyses was that weak lensing does not easily [12] lend itself to simulations, due to reacting to cosmic structures on the scale of galaxy groups, and due to these structures falling already into the strongly non-linear regime of structure formation. We therefore here seek to minimize the number of simulations, thereby trading for high accuracy of the few simulations, and nonetheless gaining faithful parameter constraints from a joint analysis of data and simulations with a likelihood.
Section 3.4 mitigates parameter biases from approximate likelihoods, where our example uses approximate covariance matrices. Section 3.5 studies photometric redshift uncertaintites and shows that redshift uncertaintites alone (without biased redshifts) can be neglected in current weak lensing surveys. Section 4 shows that 25 end-to-end simulations of a Euclid-like [13] survey, in conjunction with an independent likelihood for this survey, suffice to guarantee that the figure of merit deviates maximally by 22 percent from its optimum. For 225 end-to-end simulations, the figure of merit can be guaranteed to deteriorate by maximally 10 percent. As a result, it can be taken essentially for granted that the upcoming Euclid-like surveys will lead to an 25-fold increase in our knowledge of the dark energy equations of state parameters [14] and .
2 Mitigating unrecognized biases: method and examples
To avoid that unrecognized biases feign new physics, we establish a method that takes as input any existing data analysis pipeline. The method runs the pipeline on simulations and real data alike, and then computes and corrects the coverage probability. We describe why this procedure debiases parameter constraints.
2.1 What do posteriors really measure?
Biases in an inference cause that a posterior, or likelihood, exclude the true parameters too often, for example because the posterior is shifted or too narrow. The notion of ‘too often’, is made mathematically precise by coverage probabilities. The coverage probability of a posterior credibility contour is the fraction of times that this contour includes the true parameters, under repetitions of the experiment. The default expectation of most scientists is that the 68 percent credibility contour (as an example) contains the true parameters 68 percent of the time. In reality, however, the 68 percent posterior credibility contour is constructed such that it contains 68 percent of the posterior volume. Most scientists expect that posterior volume measures (Kolmogorov) probability, but this is not necessarily so. We refer to this expectation by speaking of ‘correct coverage’ for short [15, 16].
Coverage probabilities superficially sound like a frequentist concept, but so-called objective Bayesian analyses [17] achieve the correct coverage probabilities as well [18, 19], due to their explicit construction of priors. Objective Bayesian analyses thus implement the correct noise propagation through mathematical derivations, with the result that posterior volume indeed measures probability. In contrast, the correct coverage is not automatically achieved by so-called subjective Bayesian analyses [20, 21]. These regard priors as subject to choice, or use hyper-parameters, approximate likelihoods [5, 11], or idealized parametric models, with the result that the total Bayesian flow of information is not representative of nature, although mathematically self-consistent [15, 19].
In total, it cannot be taken for granted that posterior volume measures probability as expected, but such potential discrepancies can be reported by quoting coverage probabilities. This is of direct relevance to tensions between experiments.
2.2 Algorithm to measure the coverage probability of posteriors
Any unrecognised or unintended systematic will affect the coverage. Hence, measuring the coverage can detect hidden biases, even if the source of the biases is unknown. Correcting the coverage is then a model-independent solution for mitigating unrecognized biases. We measure and correct the coverage probability with the following algorithm.
A set of fiducial parameters is chosen for simulations of artificial data sets , with . These simulations imitate the real data . We denote posterior densities by curly capital , and associated probabilities, that are scalar rather than densities, by roman .
A state-of-the-art likelihood is then run on all simulations and also on the real data. This results in posteriors from simulations, and the posterior of the real data. For each of these posteriors, 120 credibility contours (or more) are computed. We provide a public code111Public at github.com/elenasellentin/Mitigate_Unrecognized_Biases, where 100 of these contours are equidistant between zero and 99.9 percent posterior credibility. Twenty further contours are equidistant between 95.25 and 99.75 percent credibility. These finely spaced contours enable a reliable coverage correction in the outer tails of a posterior. If the data analysis pipeline contains biases, then the contours resulting from it will not cover correctly.
We denote by fractions of the posterior volume, and accordingly . We consider posterior contours that contain a fraction of the posterior volume and which are isocontours of the posterior. They thus cut the posterior in a certain height below its peak. For each data set , the posterior will be slightly differently shaped, and the height of the th contour thus changes with . We therefore denote this height as roman , where identifies the data set, and identifies the fraction of posterior volume that the contour contains.
Each of the posteriors will take a different (scalar) value at the fiducial parameters of the simulation. We denote this value as roman , where the subscript zero indicates that this is the posterior probability assigned to the fiducial parameters.
The th credibility contour then contains the true parameters if the posterior value at the fiducial parameters is larger than the posterior height of the contour:
[TABLE]
We measure this for all contours, for all posteriors. The coverage probability, , is then the probability that the -posterior credibility region contains (‘covers’) the true parameter values
[TABLE]
The default expectation would be that , meaning that posterior volume measures probability under repetition of the experiment. In contrast, if biases occurred in the analysis, then a credibility contour further out in the posterior will achieve coverage .
For example, the allegedly 95 percent credibility contour of the biased analysis might be found to contain the true parameters only 90 percent of the times. Then it is in reality the 90 percent contour, until the bias is found and corrected. If the bias cannot be found, a mitigation is to discard the biased contours and instead adopt the contours of correct coverage. The new contours will then include the true parameters with the right fraction of times – despite the bias being unknown.
The coverage of equation 2.2 can be estimated from simulations, and we denote its estimator by . This estimator simply counts how often the true parameters fall inside the -contour. If they do not fall inside the contour, they fall automatically outside, and this either-or process indicates that the estimator must (by definition) follow a binomial distribution with success rate and trials. The mean and standard deviation of the binomial distribution then give the mean and standard deviation of our coverage estimator
[TABLE]
Figure 1 shows that the binomial distribution models the noise in the estimated coverage correctly: for the innermost contours, where is low, and for the outermost contours, the standard deviation is the smallest.
If credibility contours cover correctly, then , and the standard deviations will in the following be adopted as error bars.
We provide three simple examples of coverage correction in section 2.3, before we apply the method to cosmological analyses in section 3.
2.3 Examples
Figure 2 illustrates three examples of mitigating unrecognized biases via coverage correction. The real data vector contains 100 data points, drawn from a Gaussian distribution with unit variance. The first fifty data points have mean , the remaining data points have . The parameters to be inferred are and . We simulate 1000 artificial data vectors, by drawing from the same Gaussian. A bias is then introduced in the analysis, the coverage is measured and corrected, resulting in increased contour size.
Example 1, in the left panel of figure 2 is bias free: the data are analyzed with the correct Gaussian likelihood and a flat unbounded prior, which produces automatically the correct coverage for linear parameters, such as . Example 2, in the middle panel, analyzes the data with a biased inverse covariance matrix. The correct inverse covariance would have been , the identity matrix, but the off-diagonal elements were changed to . Section 3 will detail why biased covariance matrices shift posteriors, here we only illustrate that our method is able to correct for this, without needing to know the origin of the error.
Figure 3 plots the measured coverage probabilities of example 2. Due to the hidden bias, the contours are systematically to small, resulting in the seen undercoverage of figure 3.
Finally, example 3 corrects the effects of an unintentionally informative prior , given by
[TABLE]
where is the Gaussian distribution. The prior is so informative that the biased posterior excludes the true parameters (blue point). After coverage correction, the true parameters are again included. Plotted contours lie at 68, 90 and 95 percent posterior volume (before coverage correction, yellow), and at 68, 90 and 95 percent coverage probability (after correction, blue).
2.4 Blind spots of the method
The method detects discrepancies between simulations and the assumptions of a data analysis pipeline. It then corrects for these discrepancies when analyzing the real data. Consequently, it cannot correct for systematic effects which are omitted in both simulations and the analysis pipeline. For example, if neither a likelihood, nor the simulations include a survey mask, then the method cannot correct for imperfections in survey mask handling. Likewise, if the simulations implement precisely the same assumptions as the analysis pipeline, then a self-confirming situation is created, which the method also cannot detect. If the posterior then peaks nonetheless in unexpected regions, then the real data obey other physical or statistical laws than the ones simulated.
If the simulations lack in accuracy, then our method suffers from the same difficulties as any simulation-based inference technique. We shall however find in section 4 that our method requires by many orders of magnitude the fewest number of simulations [3, 2]. This arises due to the the joint analysis with a likelihood, which already contains statistical information which simulations would otherwise need to provide.
3 Applications to cosmology
In this section we apply coverage calibration to cosmic shear (weak lensing) analyses [7, 8, 10], where approximate covariance matrices and redshift uncertaintites often introduce biases of unknown magnitude and of unknown parametric form.
3.1 Why approximate covariance matrices shift posteriors
Approximate covariance matrices are today used in weak lensing [7, 8], but also supernova analyses adapt their covariance matrices to achieve a desired goodness of fit [22, 23]. One often encountered preconception is that such approximate covariance matrices only affect the width of posterior contours, but not where a posterior peaks. We therefore explain why the opposite is true: We show that using an approximate covariance matrix is mathematically the same as fitting to a biased data set, and systematic parameter shifts will ensue.
Consider a Gaussian likelihood, as is currently standard in cosmology
[TABLE]
where, is a -dimensional data vector and the superscript denotes transposition. The mean is a function of the parameters to be inferred, and the covariance matrix is . Parameters are then estimated by sampling the posterior, which is the likelihood times a prior.
To isolate the effect of approximate covariance matrices, we assume unbounded flat priors, and that the data contain no systematic effects. Such sound data are then nonetheless effectively transformed into a biased data set, if an approximate covariance matrix is used in the analysis. This can be seen as follows.
Let the correct covariance matrix be and let be an approximation of it. Both are symmetric positive-definite matrices.
If an analysis uses the correct covariance matrix, the best fit lies where
[TABLE]
is minimal. Equation 3.2 describes that the parameters of the model will adjust to minimize the distance to the data . The best-fitting parameters are then for which . During minimization, statistical compatibility between the mean and the data is measured in units of the inverse covariance matrix. If we exchange the covariance matrix, this distance measure changes. In the units of the new covariance matrix, another will then be closest to the data . Consequently, the parameters will adapt, in order to produce this new mean as well as possible.
We now relate the two matrices via the function
[TABLE]
where a bias occurs if . The left- and right-multiplication by is convenient, but not a specialization. We could equally have written
[TABLE]
where is the matrix of additive inaccuracies. Since is a symmetric matrix, is also by construction symmetric. The matrix is then guaranteed to exist, since for symmetric matrices any congruent matrix is again symmetric for all , and equations 3.3 and 3.4 are both valid ways of describing the systematic uncertainties in a covariance matrix. The corresponding additive uncertainty is then
[TABLE]
Since is unknown, cosmology is forced to use for the likelihood. The thus gained -squared surface is then minimized where
[TABLE]
is minimized. This occurs at a new mean , and the parameters will differ from .
If we conduct a thought experiment where we forget that the new parameters differ, we see that using a biased covariance matrix is akin to analyzing a biased data set with the correct covariance matrix. To see this, we set in our thought experiment. Then, to yield as good a best fit as when using the correct covariance matrix, we have to demand
[TABLE]
This can be solved for , and we find
[TABLE]
This shows that using an incorrect covariance matrix to analyze a sound data set is mathematically equivalent to analyzing the biased data set with the correct covariance matrix. Only if equals the identity matrix does coincide with .
In cosmology, the data are of course fixed. The only free variables to compensate for the bias in the covariance matrix are then the cosmological parameters . The incorrect covariance matrix will consequently force the likelihood to peak at biased parameter values.
In fact, in order for the biased equation 3.6 to reproduce as good a fit as the correct equation 3.2, the relation
[TABLE]
needs to hold. Solving for the now preferred , we find
[TABLE]
The parameters will then attempt to create the mean instead of the mean . Depending on the flexibility of the model, the parameters may not fully succeed in this. In total, we see however that a shift in parameters will ensue, and the direction and magnitude of the shift depends on the drawn data vector , and the biasing matrix , according to equation 3.10.
3.2 Undetectability in Fisher matrix forecasts
The effect of approximate covariance matrices biasing parameters is invisible in Fisher matrix forecasts [24, 13], because on average, one has , and equation 3.10 then predicts . Fisher forecasts will therefore underestimate the total uncertainty. The biasing effect will only occur when analysing real data, where is fixed to the realization on the sky. Equation 3.10 then describes that noise can be incorrectly interpreted as ‘signal’ when the wrong covariance matrix is employed. In the following section we will present an example of thus resulting parameter biases.
3.3 Forcing the KiDS-450 data to prefer the Planck cosmology
Concerning how approximate covariance matrices bias physical parameters, we here illustrate that direction and magnitude of the posterior shift can also be controlled. Additionally, the goodness of fit can also be kept constant. A reduced- of order unity is therefore by no means a good indication that the best-fitting parameters are unbiased.
We illustrate this for the public KiDS-450 data from [7], and force these data to prefer the Planck cosmology. DES [8] analyses could equally have been used. We underline that we here force this transition to the Planck best-fitting cosmology. The aim of this study is dual, namely first to understand which data points are affected, and secondly to understand which procedures must be put in place in order to prevent such shifts.
We work with the original KiDS-450 data vector of 130 elements, which are the real-space estimators and [10, 25] in four tomographic redshift bins and their cross-bins. Our weak lensing setup to compute the theory vector is identical to [7] and our code has been verified against the code of [7], leading to identical results for the theory vectors, given identical input parameters. We use CLASS [26, 27], and Halofit [28] for the non-linear power spectrum. We fix the spectral index and the reduced Hubble constant , to Planck-motivated values of and . Varying the cold dark matter density and the normalization of the power spectrum , we find the best-fitting cosmology for the KiDS-450 data vector when analyzed with the public KiDS-450 covariance matrix to be
[TABLE]
with a 222The high value of this results from having fixed and to the Planck best-fitting values, rather than the KiDS best-fitting values.. By transforming the covariance matrix, we now force the KiDS-450 data to prefer the Planck cosmology. This can be repeated for arbitrarily many parameters.
We precompute the cosmological predictions on a grid, and store the results, in order to make the upcoming analyses of this paper numerically feasible.
We use a Planck best-fitting cosmology with [29]
[TABLE]
The original KiDS-450 analysis [7] leads to posterior constraints on and which are in tension with the Planck constraints. The left panel of figure 4 plots the result of subtracting the KiDS-best fitting cosmology, or the Planck best-fitting cosmology from the KiDS data vector. Subtracting the Planck best-fitting cosmology leads to multiple sequences of adjacent data points being systematically below the mean (blue triangles in the negative domain). A covariance matrix can be tricked into expecting such a situation: By definition we have that the covariance between data points and is
[TABLE]
where denotes taking the expectation value. Since this is an expectation value, a covariance matrix does not simply describe noise, but is rather extremely prescriptive: a positive covariance between data point and describes that if data point is below the mean, then data point is expected to be below the mean as well. We can hence construct a covariance matrix that expects the noise pattern of the blue triangles in figure 4 and interprets it as a strong positive correlation between all data points that are below the mean. The data points whose noise will thereby be most strongly reassessed are listed in table 1, which illustrates that it is consistently the estimators on the lowest angular scales (mostly and arcmins) who will show instability with respect to cosmological parameters, when their noise is reassessed. In this context it is interesting to note that the DES survey [8] excludes on such low scales, which will be partially responsible for why DES posteriors are closer to Planck than KiDS-450 posteriors.
We denote the original KiDS-450 covariance matrix as . The minimal is then reached for
[TABLE]
where the parameter vector is ordered as . We now demand that the KiDS data vector instead produces the Planck cosmology as best fit, and solve for the matrix from equation 3.10 which enables this.
Since the matrix has entries, but equation 3.10 only poses constraints, reconstructing is an under-determined system. There will hence be infinitely many solutions for , which directly implies that trying to debias an approximate covariance matrix is bound to fail.
Here, we now pick out one solution, by demanding to be diagonal, . The required diagonal elements to force the KiDS data to prefer the Planck cosmology then follow to be
[TABLE]
Using the original KiDS-450 data vector, and transforming the inverse KiDS covariance matrix to , the Planck cosmology indeed becomes the new best fit
[TABLE]
The two posteriors arising from analyzing the KiDS data with the two covariance matrices are depicted in figure 4. This figure illustrates the successful translation of the posterior, although data and physical model were not changed. The Planck cosmology now fits the KiDS data with the same goddness of fit (the same ) as the KiDS best-fitting cosmology fitted the KiDS data before. Also visible is, however, that the new posterior is deformed. This side effect arises because the determinant of the covariance matrix was changed333Keeping the determinant constant would impose only one additional constraint, still leading to infinitely many solutions for , again leading to the conclusion that a non-parametric method is needed to debias inference with approximate covariance matrices..
We compute the relative differences between the original and the transformed covariance matrices. The matrix of relative differences is given by
[TABLE]
where and are the indices of the matrix elements and is shorthand for the transformed matrices. The left panel of figure 5 shows the relative difference matrix for the transformed covariance matrix , and the right panel shows the relative difference matrix of the transformed inverse covariance matrix . The difference between the left and the right panel highlights the unpredictability of the matrix inversion: even if most columns in are drastically changed, these changes can be redistributed during the inversion, and it is thus important to judge the accuracy of an inverse covariance matrix directly.
Figure 5 reveals factor 20 changes in certain elements of the inverse covariance matrix. This is to be compared to the DES reanalysis [30] of KiDS-450, where the reanalysis implemented factor 3 changes in the shape noise contribution to elements of the covariance matrix, and parameter shifts were found. We therefore conclude that a debiasing procedure for approximate covariance matrices is indeed needed.
As infinitely many solutions exist to induce a bias such that any arbitrary cosmology becomes the best-fitting cosmology, a parametric Bayesian treatment will not be able to debias the inference. In the following section we will hence reverse the workflow, and accept that any fixed covariance matrix of unknown bias will necessarily be used, and we debias the thus resulting parameter inference with coverage calibration.
3.4 Debiasing inference with approximate covariance matrices of unknown bias
In this section, we illustrate how to compute unbiased credibility contours for cosmological parameters, despite using a covariance matrix of unknown but non-zero bias.
A necessary prerequisite for our method are accurate simulations. Importantly, these simulations are not used to compute a covariance matrix, or its inverse – they are used to debias the inference pipeline which uses the approximate, analytical covariance matrix. To compute a numerical covariance matrix from simulations, one would require , where is the dimension of the data set. To run our debiasing procedure, significantly fewer simulations are needed, and their number does not scale with the dimension of the data set either, see equation 2.3 and section 4.
We again use KiDS-450 as an example. For current weak lensing studies, sufficiently many or accurate simulations do not yet exist to conduct a coverage measurement. KiDS-450 posesses 930 simulations for 100 square degree sky patches [31], but spans by itself approximately 450 square degree. DES uses 18 simulations in [32], where the number of simulations is now the limiting factor.
To demonstrate our method, we therefore generate 100.000444This large number resulted from experimenting with run time constraints. Far fewer are needed in reality, see section 4 for Euclid requirements. Gaussian realizations of data vectors with the KiDS best-fitting cosmology as mean, and with the public KiDS covariance matrix. These shall serve as our simulations replacement.
We then precompute the theory vectors on a grid in the -plane, and then compute the 100.000 posteriors. The posterior per data vector is
[TABLE]
where is the Gaussian likelihood, and are priors on the parameters. We use top-hat priors, with
[TABLE]
Finally, the coverage is computed.
The coverage resulting from this analysis pipeline is plotted in the left of figure 6. The red diagonal line indicates the perfect coverage for an unbiased analysis. Measured coverage probabilities above the red line indicate conservative credibility contours, which are strictly speaking too wide. Measured coverage probabilities below the red line indicate credibility contours which are too narrow. The purple data points depict the measured coverage with error bars. As can be seen, the posterior with the correct covariance matrix undercovers slightly, meaning it is slightly too narrow. This reflects that the adopted priors are slightly informative, as is well known in weak lensing [33, 7, 34].
Next, we analyse the 100.000 simulations purposefully with a biased covariance matrix. We left- and right-multiply the KiDS covariance with a diagonal biasing matrix , whose diagonal elements are drawn from a Gaussian distribution
[TABLE]
The larger the standard deviation , the larger will be the bias in . The relative difference between original covariance matrix, and biased covariance matrix then follows from the mean and standard deviations of . Per matrix element, we have on average
[TABLE]
and using , the variance follows to be
[TABLE]
According to equation 3.21 the bias vanishes on average, and has according to equation 3.22 a standard deviation of . The relative difference between biased and correct covariance matrix is then
[TABLE]
which is independent of matrix indices . The relative differences can be compared to the literature: for example, the DES reanalysis of the KiDS-450 data [30] implemented 40 percent changes in the covariance matrix elements (see Figure 1 of [30]). Current approximate covariance matrices in weak lensing are therefore uncertain to approximately a degree of .
We study such example biases in figure 6, for and . Analyzing the data with such biased covariance matrices causes the posterior to preferentially peak in the wrong region of parameter space, thereby excluding the true cosmology too often. The left panel of figure 6 illustrates this effect by showing how the best-fitting cosmologies are shifted. To each of these new best-fitting cosmologies belongs a new posterior (not plotted) whose credibility contours are of approximately the same shape as those of the original posterior, only centered on the new best fits. The blue and pink coverage measurements in the right panel of figure 6 indicate how quickly the posterior begins to undercover if the biases of such covariance matrices are not mitigated.
Since the left of figure 6 indicates that for current levels of covariance matrix uncertainty ( the best fit scatters over nearly the entire undebiased posterior, we conclude that such uncertainties definitely need to be propagated. We illustrate such a propagation first for the traditional Bayesian marginalization, and then for coverage correction.
Given our bias model with , the posterior of cosmological parameters when marginalized over is given by
[TABLE]
where the uncertainty of is
[TABLE]
The matrix-variate integration is element-wise which becomes quickly numerically prohibitive due to the curse of dimensionality. For the 130-dimensional diagonal used in equation 3.20, it is still feasible, and we implement it via a Monte-Carlo integration. The resulting posterior, , is depicted in figure 7 in blue dotted contours, and is wider than the original KiDS posterior (solid grey contours) due to the marginalization.
The Bayesian marginalization was here only possible since we knew the model which caused the bias. In a realistic analysis, such a model is not known, and we need to propagate the bias blindly via coverage calibration.
We therefore compute the posterior
[TABLE]
of which we know that it must be biased to unknown degree, due to having used the covariance matrix of unknown bias. The measured coverages in figure 6 reveal that for , the credibility contour which contains 68% of the posterior volume, only covers the true cosmology 42% of the time. In contrast, the contour which contains 92% of the posterior volume, included the true cosmology 68% of the time. The 92% credibility contour of the biased posterior is hence only the 68% credibility contour after debiasing. Figure 7 shows that this coverage calibration complies with the Bayesian marginalization, with the advantage that it required no parametric model.
3.5 Mitigating redshift uncertainties by coverage calibration
In this section, the aim is to propagate redshift uncertainties in a non-parametric manner, for the following reasons.
Estimating the redshift of a galaxy becomes difficult when only photometric flux measurements are available. Tomographic weak lensing analyses assign galaxies to distributions , where denotes the bin, . If the redshifts have to be determined photometrically, then the estimated galaxy distributions are uncertain and we write . There will thus exist a probability distribution
[TABLE]
where the curly braces indicate the set of all tomographic bins.
Propagating uncertainty from the through weak lensing analyses is difficult. Bayesian analyses would try to establish the precise functional form of in equation 3.27, and then marginalize over it, resulting in the marginal posterior of cosmological parameters
[TABLE]
This integral is numerically extremely costly555Already figure 8 required a CPU-time of 26 days in parallel on 10 modern Xenon CPUs (100 times as long as a computation for a single redshift realization)., and has to date not yet been solved. Consequently, the current standard approach is to introduce nuisance parameters instead.
Both KiDS and DES introduce nuisance parameters which shift the centers of redshift bins [35, 8]. It has however been found [35], that the nuisance parameter originally introduced for the intrinsic alignment amplitude also fits to redshift uncertainties. This problem occurs because nuisance parameters fit, i.e. they are part of an inverse problem which enables them to compensate for unintended systematics. Note also, that the adopted parametric nuisance model is limited in the sense of not being able to create deformations of which leave the central redshifts invariant.
We therefore wish to study the impact of redshift uncertaintites in isolation. Consequently, we replace the nuisance parameters by a forward model of redshift noise. Since the forward model generates redshift noise only, a confusion with intrinsic alignments is excluded. We then use coverage calibration to propagate the redshift uncertainties into the cosmological parameters.
To implement , we use the published redshift uncertainties from KiDS-450. We use the weighted direct calibration ‘DIR’ setup of KiDS, which matches spectroscopic galaxy observations and galaxies seen in KiDS. On average, DIR causes approximately uncertainties in each point, but we use the exact errors per point.
We implement four different forward models for redshift uncertainties. The first model generates functions whose shape and mean vary. The second model varies the shape only, but keeps the central redshift fixed. This generates uncertainties which cannot be modelled by marginalizing over the mean redshift. For both cases, we use two noise processes: Poisson realizations and the public Bootstrap realizations from KiDS [7].
Examples of the resulting noisy redshift distributions are shown in the left panel of figure 8. For each of these, we compute the theoretical prediction for the KiDS-450 data vector, and analyze it with a Gaussian likelihood, using the public KiDS-covariance matrix. The right panel of Figure 8 reveals that none of the four noise models caused the posteriors to undercover – this means that reported problems with redshifts in KiDS-450 must arise from a bias, or confusion with another systematic effect. Redshift noise in isolation, as here studied, seems to be subdominant to shape noise and cosmic variance, as included in the covariance matrix.
4 Forecasts for dark energy constraints with a Euclid-like survey
As the precision of cosmic surveys improves, the relative impact of formerly negligible biases increases. The upcoming Euclid survey [13], but also its sibling surveys LSST and WFIRST [36], will study the cosmological standard model, and its constituents. The cosmological standard model CDM is based on a cosmological constant and cold dark matter (CDM). In CDM, takes the role of dark energy, and physics beyond the standard model accordingly often introduces additional parameters, and , for extended dark energy phenomenology [26, 27]. In CDM, these parameters take values , and . If the upcoming Euclid analyses exclude this point with high significance, then CDM is ruled out and a new standard model is needed – or a bias occurred.
Due to the complexity of the data analysis, the occurrence of an unrecognized bias is of course possible, but our method is able to tell these biases and new physics apart.
We imagine a Euclid-like survey develops a likelihood, which is as accurate as possible, and which does not rely on simulations. If the likelihood is very accurate, then our method will need to correct only minor outstanding biases, resulting in a minor increase of credibility contours. This likelihood is then to be augmented by few, but very accurate, end-to-end simulations for CDM. We here forecast the number of simulations needed to guarantee that CDM is not discarded due to unrecognized biases.
According to [13], Euclid’s prime scientific target is the determination of the dark energy equation of state parameters and to a precision of
[TABLE]
where is the 1-sigma standard deviation. In a Gaussian approximation, the joint confidence contours of and are elliptical, and the figure of merit (FoM) measures this ellipses area.
For simulations, our coverage estimator has a standard deviation of . For simulations, it will thus detect biases which change confidence contours by more than . It cannot detect biases which change the coverage by less than , and accordingly
[TABLE]
is a conservative lower estimate of the coverage probability, to be interpreted as the ‘most conservative scenario’ of mitigating all possible biases which could not yet be ruled out. A credibility contour which reaches coverage contains the true parameters at least with probability , and likely more.
In figure 9 we forecast such conservative credibility contours for Euclid. The inner filled contour depicts the 68, 90, and 95 percent credibility contours for Euclid at its nominal precision. These correspond to a bias-free inference with infinitely many simulations. If fewer simulations are available, then the conservative contours increase in size, which is depicted in open contours as a function of , as given in table 2.
The right panel of figure 9 can be compared to current constraints from DES and KiDS: both of these surveys measure an equation of state , but keeping fixed to its fiducial value. Both surveys currently achieve approximately [7, 8]. Figure 9 therefore indicates that if Euclid’s data can be augmented by 25 simulations, then either biases can be detected and mitigated, or if no biases are detected but the shown conservative contours are chosen, then Euclid will still achieve approximately 25 times the precision of KiDS and DES for .
5 Discussion
This paper presented a method to mitigate biases, recognized or unrecognized, even when a Bayesian solution cannot be conducted. Our method takes any existing data analysis pipeline as input, and runs it on simulations and the real data alike. It then measures the coverage probability of credibility contours and corrects for it, if found to be off. This produces debiased contours as particle physicists (and many cosmologists) expect them to be: under a repetition of the experiment, the 68 percent confidence contour will contain the true parameters 68 percent of the time, despite the data being analyzed with an imperfect pipeline. Our method can also be understood as a sanity check for any cosmological analysis.
We showed how approximate covariance matrices determine where a likelihood peaks, and that a reduced- of order unity does not indicate an unbiased best-fit. To illustrate both points, we forced the original KiDS-450 data set to peak at the Planck best-fitting cosmology, with the exact same . We then used our method to show how inferences with approximate covariance matrices can be debiased.
We also isolated the impact of uncertain redshifts by using a forward model, since an inverse treatment was found to confuse redshift uncertainties and intrinsic alignments [35]. We found that in isolation, current redshift uncertainties are fully subdominant to shape noise and cosmic variance in current weak lensing analyses. Our study focuses on uncertaintites not biases in redshifts.
Finally, we illustrated that a pessimistic analysis of Euclid-like data, will very likely constrain the dark energy equation of state parameters by a factor of at least 25 better than current KiDS and DES analyses. This statement assumes that 25 end-to-end simulations of Euclid-like data can be provided alongside an independent likelihood.
This paper, in conjunction with [11], now found repeatedly that data vector truncation influences cosmological parameter constraints: the problematic data points always occured at the extreme of angular ranges. This motivates that blinding strategies should be kept for all upcoming surveys.
Our method is applicable to many more examples, and the code is hence public at github.com/elenasellentin/Mitigate_Unrecognized_Biases.
Acknowledgements
We appreciate the public data products of the KiDS consortium, without which this research would not have been possible. It is a pleasure to thank Ruth Durrer and Catherine Heymans for scientific discussions and long-term support.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] E. Sellentin and A. F. Heavens, Parameter inference with estimated covariance matrices , MNRAS 456 (Feb., 2016) L 132–L 136, [ ar Xiv:1511.05969 ].
- 2[2] E. Sellentin and A. F. Heavens, Quantifying lost information due to covariance matrix estimation in parameter inference , MNRAS 464 (Feb., 2017) 4658–4665, [ ar Xiv:1609.00504 ].
- 3[3] A. Taylor, B. Joachimi, and T. Kitching, Putting the precision in precision cosmology: How accurate should your data covariance matrix be? , MNRAS 432 (July, 2013) 1928–1946, [ ar Xiv:1212.4359 ].
- 4[4] F. Lacasa and M. Kunz, Inadequacy of internal covariance estimation for super-sample covariance , A&A 604 (Aug., 2017) A 104, [ ar Xiv:1703.03337 ].
- 5[5] E. Sellentin, C. Heymans, and J. Harnois-Déraps, The skewed weak lensing likelihood: why biases arise, despite data and theory being sound. , MNRAS (Apr., 2018) [ ar Xiv:1712.04923 ].
- 6[6] C. Hahn, F. Beutler, M. Sinha, A. Berlind, S. Ho, and D. W. Hogg, Likelihood Non-Gaussianity in Large-Scale Structure Analyses , ar Xiv e-prints (Mar., 2018) [ ar Xiv:1803.06348 ].
- 7[7] H. Hildebrandt, M. Viola, C. Heymans, S. Joudaki, K. Kuijken, C. Blake, T. Erben, B. Joachimi, D. Klaes, L. Miller, C. B. Morrison, R. Nakajima, G. Verdoes Kleijn, A. Amon, A. Choi, G. Covone, J. T. A. de Jong, A. Dvornik, I. Fenech Conti, A. Grado, J. Harnois-Déraps, R. Herbonnet, H. Hoekstra, F. Köhlinger, J. Mc Farland, A. Mead, J. Merten, N. Napolitano, J. A. Peacock, M. Radovich, P. Schneider, P. Simon, E. A. Valentijn, J. L. van den Busch, E. van Uitert, and L. Van Waerbeke, Ki DS-45
- 8[8] M. A. Troxel, N. Mac Crann, J. Zuntz, T. F. Eifler, E. Krause, S. Dodelson, D. Gruen, J. Blazek, O. Friedrich, S. Samuroff, J. Prat, L. F. Secco, C. Davis, J. Weller, and Y. Zhang, Dark Energy Survey Year 1 Results: Cosmological Constraints from Cosmic Shear , Ar Xiv e-prints (Aug., 2017) [ ar Xiv:1708.01538 ].
