Human vs. machine -- 1:3. Joint analysis of classical and ML-based summary statistics of the Lyman-$\alpha$ forest
S. Chang, P. Nayak, M. Walther, D. Gruen

TL;DR
This paper compares traditional and machine learning-based summary statistics for Lyman-alpha forest data, showing ML summaries capture nearly all traditional info and significantly improve parameter constraints.
Contribution
It demonstrates that ML-based summaries contain most traditional information and offer substantially tighter constraints on intergalactic medium parameters.
Findings
ML summaries nearly encompass traditional statistics' information
ML summaries improve parameter constraints by over a factor of 3
Combining summaries enhances the figure of merit significantly
Abstract
In order to compress and more easily interpret Lyman- forest (LyF) datasets, summary statistics, e.g. the power spectrum, are commonly used. However, such summaries unavoidably lose some information, weakening the constraining power on parameters of interest. Recently, machine learning (ML)-based summary approaches have been proposed as an alternative to human-defined statistical measures. This raises a question: can ML-based summaries contain the full information captured by traditional statistics, and vice versa? In this study, we apply three human-defined techniques and one ML-based approach to summarize mock LyF data from hydrodynamical simulations and infer two thermal parameters of the intergalactic medium, assuming a power-law temperature-density relation. We introduce a metric for measuring the improvement in the figure of merit when combining two…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
