A First Look at creating mock catalogs with machine learning techniques
Xiaoying Xu, Shirley Ho, Hy Trac, Jeff Schneider, Barnabas Poczos,, Michelle Ntampaka

TL;DR
This paper explores machine learning methods, specifically SVM and kNN, to predict galaxy counts in halos for mock catalog creation, offering a non-parametric alternative to traditional models with promising accuracy.
Contribution
It introduces ML techniques for halo galaxy prediction that do not rely on prescribed relationships, improving flexibility and feature selection in mock catalog generation.
Findings
Achieved MSE of ~0.16 in galaxy number predictions.
Predicted distributions match observed halo distributions.
Galaxy correlation functions are accurate within 5-10% at large scales.
Abstract
We investigate machine learning (ML) techniques for predicting the number of galaxies (N_gal) that occupy a halo, given the halo's properties. These types of mappings are crucial for constructing the mock galaxy catalogs necessary for analyses of large-scale structure. The ML techniques proposed here distinguish themselves from traditional halo occupation distribution (HOD) modeling as they do not assume a prescribed relationship between halo properties and N_gal. In addition, our ML approaches are only dependent on parent halo properties (like HOD methods), which are advantageous over subhalo-based approaches as identifying subhalos correctly is difficult. We test 2 algorithms: support vector machines (SVM) and k-nearest-neighbour (kNN) regression. We take galaxies and halos from the Millennium simulation and predict N_gal by training our algorithms on the following 6 halo properties:…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
