# Estimating the galaxy two-point correlation function using a split   random catalog

**Authors:** E. Keih\"anen, H. Kurki-Suonio, V. Lindholm, A. Viitanen, A.-S., Suur-Uski, V. Allevato, E. Branchini, F. Marulli, P. Norberg, D. Tavagnacco,, S. de la Torre, J. Valiviita, M. Viel, J. Bel, M. Frailis, and A. G., S\'anchez

arXiv: 1905.01133 · 2019-10-23

## TL;DR

This paper proposes a split random catalog method to efficiently estimate the galaxy two-point correlation function, significantly reducing computation time while maintaining accuracy, crucial for large upcoming galaxy surveys.

## Contribution

Introducing a split random catalog approach that optimizes the calculation of the galaxy two-point correlation function by reducing computational costs without sacrificing precision.

## Key findings

- Splitting the random catalog into subcatalogs improves efficiency.
- Excluding cross-subcatalog pairs maintains estimator accuracy.
- Reduces computation time by over tenfold for large catalogs.

## Abstract

The two-point correlation function of the galaxy distribution is a key cosmological observable that allows us to constrain the dynamical and geometrical state of our Universe. To measure the correlation function we need to know both the galaxy positions and the expected galaxy density field. The expected field is commonly specified using a Monte-Carlo sampling of the volume covered by the survey and, to minimize additional sampling errors, this random catalog has to be much larger than the data catalog. Correlation function estimators compare data-data pair counts to data-random and random-random pair counts, where random-random pairs usually dominate the computational cost. Future redshift surveys will deliver spectroscopic catalogs of tens of millions of galaxies. Given the large number of random objects required to guarantee sub-percent accuracy, it is of paramount importance to improve the efficiency of the algorithm without degrading its precision. We show both analytically and numerically that splitting the random catalog into a number of subcatalogs of the same size as the data catalog when calculating random-random pairs, and excluding pairs across different subcatalogs provides the optimal error at fixed computational cost. For a random catalog fifty times larger than the data catalog, this reduces the computation time by a factor of more than ten without affecting estimator variance or bias.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1905.01133/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/1905.01133/full.md

## References

43 references — full list in the complete paper: https://tomesphere.com/paper/1905.01133/full.md

---
Source: https://tomesphere.com/paper/1905.01133