Constraining Galaxy-Halo Connection Using Machine Learning
Abhishek Jana, Lado Samushia

TL;DR
This paper explores machine learning methods to model small-scale galaxy clustering for constraining Halo Occupation Distribution parameters, emphasizing the importance of data processing and algorithm choice for unbiased results.
Contribution
It demonstrates that with proper data handling and algorithm selection, ML can efficiently and accurately constrain HOD parameters, outperforming traditional methods in computational efficiency.
Findings
ANNs outperform RF and ridge regression in predictions
Restricting HOD prior space improves ML robustness
Combining clustering statistics enhances parameter constraints
Abstract
We investigate the potential of machine learning (ML) methods to model small-scale galaxy clustering for constraining Halo Occupation Distribution (HOD) parameters. Our analysis reveals that while many ML algorithms report good statistical fits, they often yield likelihood contours that are significantly biased in both mean values and variances relative to the true model parameters. This highlights the importance of careful data processing and algorithm selection in ML applications for galaxy clustering, as even seemingly robust methods can lead to biased results if not applied correctly. ML tools offer a promising approach to exploring the HOD parameter space with significantly reduced computational costs compared to traditional brute-force methods if their robustness is established. Using our ANN-based pipeline, we successfully recreate some standard results from recent literature.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAstronomy and Astrophysical Research · Distributed and Parallel Computing Systems
