Debiasing Cosmic Gravitational Wave Sirens
Ryan E. Keeley, Arman Shafieloo, Benjamin L'Huillier, Eric V. Linder

TL;DR
This paper presents a model-independent statistical approach using Gaussian process regression to accurately estimate cosmological parameters from gravitational wave data, reducing bias and enabling rigorous tests of cosmological models.
Contribution
It introduces a Gaussian process-based method for bias-free reconstruction of the Hubble parameter, enhancing the analysis of gravitational wave sirens independently of specific cosmological models.
Findings
Gaussian process regression effectively removes bias in $H(z)$ reconstruction.
Model-independent combination with supernova data improves parameter estimation.
Redshift systematic control must reach spectroscopic precision to prevent bias.
Abstract
Accurate estimation of the Hubble constant, and other cosmological parameters, from distances measured by cosmic gravitational wave sirens requires sufficient allowance for the dark energy evolution. We demonstrate how model independent statistical methods, specifically Gaussian process regression, can remove bias in the reconstruction of , and can be combined model independently with supernova distances. This allows stringent tests of both and CDM, and can detect unrecognized systematics. We also quantify the redshift systematic control necessary for the use of dark sirens, showing that it must approach spectroscopic precision to avoid significant bias.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Debiasing Cosmic Gravitational Wave Sirens
Ryan E. Keeley,1 Arman Shafieloo1,2 Benjamin L’Huillier,1,3 Eric V. Linder1,4,5
1Korea Astronomy and Space Science Institute, Daejeon 34055, Korea
2University of Science and Technology, Daejeon 34113, Korea
3 Yonsei University, Seoul 03722 Korea
4Berkeley Center for Cosmological Physics & Berkeley Lab, University of California, Berkeley, CA 94720 USA
5Energetic Cosmos Laboratory, Nazarbayev University, Nur-Sultan, Kazakhstan 010000 E-mail: [email protected]
(Accepted XXX. Received YYY; in original form ZZZ)
Abstract
Accurate estimation of the Hubble constant, and other cosmological parameters, from distances measured by cosmic gravitational wave sirens requires sufficient allowance for the dark energy evolution. We demonstrate how model independent statistical methods, specifically Gaussian process regression, can remove bias in the reconstruction of , and can be combined model independently with supernova distances. This allows stringent tests of both and CDM, and can detect unrecognized systematics. We also quantify the redshift systematic control necessary for the use of dark sirens, showing that it must approach spectroscopic precision to avoid significant bias.
keywords:
distance scale – gravitational waves – cosmological parameters – dark energy
††pubyear: 2019††pagerange: Debiasing Cosmic Gravitational Wave Sirens–Debiasing Cosmic Gravitational Wave Sirens
1 Introduction
Using general relativity (GR) to model the observed waveform of a gravitational wave (GW), the luminosity distance to a GW source can be measured. This makes GWs from mergers of compact objects into standard sirens and offers a potential way to measure the present value of the expansion rate of the Universe, the Hubble constant (Schutz, 1986; Holz & Hughes, 2005; Dalal et al., 2006).
In order to do cosmology with these standard sirens, their redshifts must also be measured. The most straightforward way to obtain the redshift is to use GW systems with electromagnetic (EM) counterpart events (e.g. X-ray or optical flashes associated with the merger), where the redshift comes from the EM counterpart. An alternative to obtain the needed redshift information is to cross-correlate GW events with galaxy redshift surveys, as explored in Zhang (2018); Fishbach et al. (2019). Rather than assigning a redshift and luminosity distance to an individual event, this technique would, in a statistical sense, assign an average luminosity distance to a redshift bin. These ‘dark sirens’ would allow binary black hole mergers to be used as standard sirens. Binary black hole mergers are much louder (compared to binary neutron star mergers) and so are detectable at much higher redshifts and distances, implying many more of them will be seen.
The fact that GR is used to calibrate GW standard sirens makes them particularly useful in mapping the cosmic expansion history. The current standard candles used to map the expansion history are Type Ia supernova (SN). On their own, however, SN only measure ratios of distances and so can only constrain the shape of the Hubble distance-redshift relation, not its absolute scale. Thus they require calibration.
This calibration is currently done with the distance ladder including Cepheid periodic variables and the results of the calibration is summarized as a measurement of . This Cepheid measurement of has generated significant interest recently since it is currently discrepant with CDM inferences of Planck measurements of the CMB (Planck Collaboration et al., 2018) at the 4 level (Riess et al., 2016; Riess et al., 2019; Joudaki et al., 2018; Keeley et al., 2019), potentially pointing to new physics. Since GW do measure an absolute distance scale, they can be used to calibrate the SN distances and estimate . Thus GW standard sirens offer a potentially useful cross check on other methods for determining (e.g. Feeney et al., 2019).
However, as we show in Shafieloo et al. (2018), using overly restrictive model dependent techniques to infer from GW datasets runs the risk of yielding substantially biased results. This can arise from assuming the acceleration of the Universe is driven by a cosmological constant (the in CDM) rather than being more general, or appropriately agnostic, about the evolution of the dark energy component. For instance if CDM were assumed, but the dark energy component were truly dynamical (to the extent allowed by current cosmological datasets), then the inferred values of and the matter density could be biased at the 3 level.
At low redshifts, , this is less of a problem although a simple linear Hubble law is insufficient. Systematics can also arise due to peculiar velocities, as well as coherent velocity flows (e.g. Mortlock et al., 2018; Cooray & Caldwell, 2006; Hui & Greene, 2006). At higher redshift, what we call cosmic standard sirens, such systematics are mitigated. Furthermore there is far more volume and a greater number of sources. If the redshifts for these sources could be measured (e.g. by EM counterparts for binary neutron stars or neutron star black hole mergers, or by cross-correlation in the absence of EM counterparts, as with binary black holes MacLeod & Hogan (2008); Petiteau et al. (2008)) and a robust model independent technique shown to be effective, then cosmology could be tested to much better accuracy and both and the CDM model could be put to stringent tests. The Einstein Telescope (Sathyaprakash et al., 2010; Zhao et al., 2011; Taylor & Gair, 2012) will certainly be able to detect events out to these high redshifts, though whether enough of these events will have corresponding EM counterparts or are well-localized, remains to be seen.
We quantify the application of model independent statistical techniques to accurately and precisely infer , and the expansion history of the Universe, from mock GW and SN datasets. In Sec. 2, we lay out how we construct these mock datasets, aiming for future high precision measurements. We apply Gaussian processes in Sec. 3 as a model independent method and demonstrate its success in reconstructing various expansion histories in an unbiased manner. Section 4 addresses the issue of required control of redshift estimation systematics, quantifying the effects of both additive and multiplicative errors. We conclude in Sec. 5.
2 Mock Datasets
In order to test how well our model independent methods can recover alternative cosmologies, we generate mock data according to three cosmologies as in Shafieloo et al. (2018). One case is a CDM cosmology with parameters km/s/Mpc and . The other two are dynamical dark energy cosmologies with ()=() and () respectively that are consistent with the current joint cosmological probe analysis at the 68% level in Scolnic et al. (2018). All models have (, ) = ().
For each cosmology we generate mock GW datasets, ‘Pantheon-like’ SN datasets (Scolnic et al., 2018), and ‘WFIRST-like’ SN datasets. For the GW datasets, we are interested in how well these model independent methods can do compared to model dependent methods in terms of accuracy. To this end, we look at the “Next Next Generation” case from Shafieloo et al. (2018), which has 600 events up to a redshift of . This corresponds roughly to a 3rd generation network such as the Einstein Telescope (Sathyaprakash et al., 2010; Zhao et al., 2011; Taylor & Gair, 2012). In this optimistic scenario (to test strongly whether bias can be overcome out to the maximum redshift), we assume the measured GW source redshift distribution follows the cosmic volume element,
[TABLE]
and we sample from this distribution.
The redshift distribution for the ‘Pantheon-like’ SN datasets is the same as the actual Pantheon dataset, which includes 1048 SN in the redshift range (Scolnic et al., 2018). The redshift distribution for the WFIRST is taken from the WFIRST-AFTA 2015 Report (Spergel et al., 2015), which forecasts the observation of 2725 SN in the range . Each of these redshift distributions is shown in Fig. 1.
For the mock GW datasets, the distances are sampled as in Shafieloo et al. (2018) with a standard deviation of 7% in distance. This 7% is chosen to demonstrate a best-case scenario where the precision on is at the level of 1%. The distance modulii of the ‘Pantheon-like’ SN dataset are sampled with the covariance matrix of the actual Pantheon dataset. The distance moduli of the ‘WFIRST-like’ SN dataset are sampled with forecasted errors from the WFIRST-AFTA 2015 Report (Spergel et al., 2015).
Other studies such as Zhao & Wen (2018) are less optimistic about the usefulness of future GW surveys. The authors calculate the average error on the luminosity distance from GW observations as a function of the uncertainty in the position of the event on the sky. They find that even for the Einstein Telescope the uncertainty on the luminosity distance for an even is around 10% at and the uncertainty increases for larger redshifts. Networks of Einstein-like Telescopes would yield uncertainties smaller than we use for our mock datasets but would still have a redshift dependence. This increased uncertainty, coupled with the fact more of the sirens are at larger redshifts for a volume limited sample, reduces the statistical power of the GW dataset.
3 Gaussian Process
To infer the expansion history of the Universe in a model independent manner we use Gaussian process (GP) regression (Holsclaw et al., 2010; Shafieloo et al., 2012). This is a statistical sampling method where instead of sampling a parameter space, the sampling is done over the infinite dimensional space of random realizations of families of curves defined by the GP as informed by the data. In other words, instead of the expansion history being determined by , , etc. and their uncertainties, it is determined by a family of model independent curves subject to the GP covariance function between data points.
For a more accurate reconstruction of we control the dynamic range by defining the variable
[TABLE]
We use since this is what distances are linearly proportional to. The fiducial expansion history is taken to be the best fit CDM cosmology for the given input cosmology (e.g. each of our three test models). The log ratio also enables a clear test of CDM – a deviation from zero points to a deviation from CDM.
Since the expansion history is expected to be smoothly varying we use the standard squared exponential covariance function
[TABLE]
where our redshift variable is , again to control the dynamic range. We take , the highest redshift of the Pantheon dataset. The hyperparameters and play important roles for both error control and physical insight, with the first characterizing the amplitude of deviations from the fiducial cosmology and the second the correlation scale of fluctuations. Since they have physical meaning and impact on the derived cosmology, they cannot be fixed but must be fit for. As such, we impose a scale invariant prior on these hyperparameters. Their posterior probability distribution functions carry important information. If is consistent with zero this means there is no statistically significant evidence for deviations from the fiducial, i.e. CDM cosmology (Shafieloo et al., 2013; Aghamousa et al., 2017). If the correlation length is very small this may mean one is fitting for noise in the data; if it is very large this may mean the data is uninformative about the expansion history.
The GP regression works by randomly generating a family of functions described by the covariance function, and evaluating the likelihood when comparing to the data. As stated, the fiducial cosmologies () are taken to be the best fit CDM cosmology for each of the three cases. For the resampled CDM input cosmology, this is unsurprisingly the input one, namely (. For the input ( cosmology, the best fit CDM cosmology is (, and for the input ( cosmology the best fit CDM cosmology is (.
For each of these expansion histories, we then calculate the corresponding luminosity distances,
[TABLE]
An advantage of GP is that linear operations (integration or differentiation) on a GP are themselves GP. The regression is done by weighting these expansion histories by how well their distances fit the data, which is equivalent to calculating the posterior. The code used for this calculation was adapted from GPHist Kirkby & Keeley (2017), which first appeared in Joudaki et al. (2018). This modification is located in an open repository111https://github.com/rekeeley/gphist_GW.
The posteriors, i.e. the reconstructed expansion histories and distance-redshift relations, are shown in Fig. 2. In both CDM and – example cosmologies, the median of the GP successfully tracks the input values (despite using CDM as the initial, or fiducial, model). Thus, without needing to make any assumptions about the nature of the true expansion history, it can be recovered accurately using GP regression.
The posteriors of the hyperparameters are shown in Fig. 3. As discussed previously, these posteriors can be used to determine if the reconstruction is meaningfully different than the mean function. Specifically, since for each input cosmology we chose the mean function to be the best-fit flat CDM cosmology for the specific realization of that input cosmology, we can then conclude that if the posterior for the hyperparameters picks out a value for larger than [math], then these forecasted mock datasets contain information disfavoring flat CDM. In such a case this points to needing some additional physics, e.g. dark energy or spatial curvature. For our input CDM cosmology, the posterior for is consistent with 0, indicating the data have no preference for anything beyond the best-fit CDM cosmology.
However the resampled – cosmologies do show the need for flexibility beyond the best-fit CDM cosmology. In both cases, the posterior for rules out at the level. In the case, is ruled out at a moderate significance and in the , it is ruled out at a more extreme significance. Thus, when the true input cosmology includes a dark energy equation of state beyond , this methodology is able to detect the data’s preference for information beyond the best-fit CDM.
Fig. 4 summarizes the results presented in this paper. They agree with the model dependent – approach taken in Shafieloo et al. (2018). For each of the different input cosmology cases, the posterior on is shown as calculated from an MCMC sampling assuming CDM (green, as in Shafieloo et al. (2018)), from a GP regression using only GW data (orange), and from a GP regression using GW and SN data (blue). The bias from assuming CDM is seen most clearly in the – plane but is seen in just the 1D posterior for as well (green lines).
However, when being more agnostic than assuming CDM, such as using model independent GP regression (orange and blue lines), the bias disappears. We can successfully debias cosmic GW sirens, even when they are the sole distance probe. While then accurate, even this next next generation dataset will not be more precise than on (apart from local sirens). If GW are combined with SN datasets – essentially using GW distances instead of the distance ladder to calibrate SN distances – then 1% precision and accuracy can be achieved even with an appropriately agnostic model independent method.
To be concrete about how using less optimistic uncertainties on GW distances such as in Zhao & Wen (2018), we repeat our analysis using GW distances as in that paper and see if the GP regression can distinguish the cosmology from the CDM best-fit to the data. The results are shown in Fig. 5. The results are less significant with the less optimistic uncertainties on GW distances. Whatever mild significance that remains is largely coming from the SN datasets, which are now only loosely anchored by the GW dataset.
4 Redshift Errors
Dark sirens rely on cross-correlation with large scale structure to estimate the redshift of the GW event that should be associated with the measured GW luminosity distance. We now examine the accuracy needed for the redshift estimation so as not to bias the cosmological parameter determination. In particular, a systematic constant offset could look similar to a shift in Hubble constant, while redshift dependence might propagate into biases on the matter density or dark energy equation of state parameters.
We begin with a simple redshift residual systematics of the form
[TABLE]
i.e. an additive and a multiplicative systematic such that . Thus the observable is interpreted as but is really . We can propagate this offset easily into the cosmological parameter estimation through the Fisher bias formalism as (see Eq. 3 of Shafieloo et al. (2018))
[TABLE]
where the observable , is its uncertainty, and is the difference between the distance at the assumed redshift and at the true redshift.
For example, the systematic biases the Hubble constant by or and the matter density by or within a CDM model. While neither of these is too severe, the bias is nearly orthogonal to the degeneracy direction of the joint probability contour for –, giving a substantial when fixing to CDM. Note that this is not purely a shift in because is not linearly proportional to redshift for .
A systematic with some redshift dependence but no low redshift systematic, e.g. , biases more substantially, by , and by , again in a direction such that . Including both systematic contributions, e.g. , gives a nearly linear additive effect in the parameter biases since they are nearly linear proportional to . However, the reacts more extremely since it is a product of parameter biases and parameter covariances; for example now gives =230.
Figure 6 shows examples of parameter bias in the CDM model space for various . Again note that the joint bias in terms of is much larger than individual parameter biases, being 44, 75, and 230 for the three examples, corresponding to well over .
In the only systematic case, we would require the systematics be controlled to to obtain (i.e. joint confidence bias). The equivalent for only is , and for the more general case of both and , when they are equal then is needed. This basically requires spectroscopic redshift precision for GW sirens to be used as an unbiased cosmological probe. Note that the addition of data from other probes, e.g. to constrain does not help. If we add an external prior of then the statistical errors shrink, and is less biased, but the bias on can actually increase due to covariances. We find the are almost unchanged, with the three systematics cases above giving 39, 74, 223 (recall the statistical contour shrinks, so even a smaller bias can give a larger ).
Returning from Fisher bias to GP regression, we can show how to use GP regression to infer the existence of redshift systematics or other unidentified systematics (see, for example, L’Huillier et al. (2019) for the case of Malmquist bias or source evolution). This is done first by calculating the median of the GP inference for one of the datasets (SN for our case), and using this as the mean function in the GP inference for the other dataset. This allows us to perform the test that if the posterior of the hyperparameters picks out a value for that is significantly above , then there is some unaccounted for discrepancy between the two datasets. Since the two datasets are generated from the same Universe, the conclusion would then be that some sort of systematic bias exists in the data.
To perform an example of this test we use the same mock GW distances and SN distance moduli from a CDM cosmology, as in the previous section, but the GW redshifts used in the inference are biased by the following equation, , where is a normal distribution with mean and standard deviation . (This is a Monte Carlo version of the case above.)
The result for this systematics test from biased redshifts in the GW dataset is shown in Figure 7. The posterior of the GP hyperparameters picks out a value for that is significantly above (at more than 99.9% level). This indicates that the GP regression is able to identify a systematic discrepancy between the GW and SN datasets (it even correctly identifies the order of magnitude of the effect). This test can only identify that some sort of systematic bias exists in the data, not that such offset comes specifically from biased redshift measurements.
5 Conclusions
GW sirens are a new distance measure with some unique characteristics. They have the potential to contribute to mapping the expansion history of the universe, including determining the Hubble constant, if appropriately treated within the cosmological context. In particular, assumptions about the background cosmology can significantly bias the Hubble constant and other parameter estimation. However, we demonstrate that a proper model independent method such as Gaussian process regression can debias the estimation and accurate reconstruct the expansion history including .
Furthermore, we illustrate how to use the GP hyperparameters as a test to determine whether the data require a beyond-CDM cosmology. This can be done by fitting a CDM model to the data, then using this best-fit model as a mean function for the GP regression. If the posterior of the hyperparameter prefers values significantly different from 0, then that implies the data requires an explanation beyond the best-fit CDM. For a “Next Next Generation” GW siren dataset, coupled with “Pantheon-like” and “WFIRST-like” supernovae datasets, GP regression was able to show mock data from reasonable – cosmologies was incompatible with the best-fit CDM cosmology, while accurately recovering the best-fit CDM cosmology from CDM-generated mock data.
This could also be used to detect unrecognized systematics in a dataset. Using the best fit expansion history from one data set (e.g. SN) as seed for the GW GP, one can again look for consistency with .
A particular example of such a systematic could be redshift inaccuracy through indirect estimation of the dark siren redshift. We derived constraints on additive and multiplicative systematics, showing that even an apparently modest single parameter bias in a model dependent fit can actually lead to quite strong bias in joint parameter confidence contours. To remove the bias requires the additive and multiplicative redshift systematics to be controlled at the spectroscopic precision level.
Cosmic GW sirens alone, even from next next generation surveys, will only determine to the accuracy level, using the model independent formalism to debias. However, the Hubble constant and expansion history can potentially be mapped more accurately by using them in conjunction with supernovae and/or local GW sirens, with systematics appropriately controlled. These results are not unique to GW sirens and would be applicable to any distance-redshift dataset.
These results necessarily depend on the assumptions built in to the construction of our mock datasets. Using larger uncertainties on the luminosity distances (especially for those at high redshift) as in Zhao & Wen (2018), our results would become less significant. Either in the reconstruction of the expansion history or in the posterior of the hyperparameters of the GP regression, any deviation away from CDM would become less significant.
The GP regression code used for this study is made publicly available.
Acknowledgements
We thank the CosKASI 2019 conference “The Correlated Universe” for providing a collaborative venue, and Tamara Davis for discussions about redshift errors and . A.S. would like to acknowledge the support of the National Research Foundation of Korea (NRF- 2016R1C1B2016478). A.S. would like to acknowledge the support of the Korea Institute for Advanced Study (KIAS) grant funded by the Korea government. BL would like to acknowledge the support of the National Research Foundation of Korea (NRF-2019R1I1A1A01063740). This work is supported in part by the Energetic Cosmos Laboratory and by the U.S. Department of Energy, Office of Science, Office of High Energy Physics, under Award DE-SC-0007867 and contract no. DE-AC02-05CH11231.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Aghamousa et al. (2017) Aghamousa A., Hamann J., Shafieloo A., 2017, Journal of Cosmology and Astro-Particle Physics , 2017, 031 · doi ↗
- 2Cooray & Caldwell (2006) Cooray A., Caldwell R. R., 2006, Phys. Rev. D , 73, 103002 · doi ↗
- 3Dalal et al. (2006) Dalal N., Holz D. E., Hughes S. A., Jain B., 2006, Phys. Rev. D , 74, 063006 · doi ↗
- 4Feeney et al. (2019) Feeney S. M., Peiris H. V., Williamson A. R., Nissanke S. M., Mortlock D. J., Alsing J., Scolnic D., 2019, Physical Review Letters , 122, 061105 · doi ↗
- 5Fishbach et al. (2019) Fishbach M., et al., 2019, Ap J , 871, L 13 · doi ↗
- 6Holsclaw et al. (2010) Holsclaw T., Alam U., Sansó B., Lee H., Heitmann K., Habib S., Higdon D., 2010, Phys. Rev. Lett. , 105, 241302 · doi ↗
- 7Holz & Hughes (2005) Holz D. E., Hughes S. A., 2005, Ap J , 629, 15 · doi ↗
- 8Hui & Greene (2006) Hui L., Greene P. B., 2006, Phys. Rev. D , 73, 123526 · doi ↗
