Detecting neutral hydrogen at z > 3 in large spectroscopic surveys of quasars
Michele Fumagalli (Milano-Bicocca), Sotiria Fotopoulou, Laura Thomson

TL;DR
This paper develops a random forest pipeline to identify high column-density neutral hydrogen clouds in large quasar surveys, achieving over 90% accuracy and providing a substantial catalog of Lyman limit systems at z > 3.
Contribution
The paper introduces a novel machine learning method for efficiently detecting LLSs in large spectroscopic datasets, validated on mock and real survey data.
Findings
Achieved >90% completeness and purity in LLS detection
Identified ~6600 LLSs in SDSS DR16 data at z~3.1-4.0
Measured LLS incidence rate of 2.32 +/- 0.08 per unit redshift
Abstract
We present a pipeline based on a random forest classifier for the identification of high column-density clouds of neutral hydrogen (i.e. the Lyman limit systems, LLSs) in absorption within large spectroscopic surveys of z>3 quasars. We test the performance of this method on mock quasar spectra that reproduce the expected data quality of the Dark Energy Spectroscopic Instrument (DESI) and the WHT Enhanced Area Velocity Explorer (WEAVE) surveys, finding >90% completeness and purity for N(HI)> 10^17.2 cm^-2 LLSs against quasars of g < 23 mag at z~3.5-3.7. After training and applying our method on 10,000 quasar spectra at z~3.5-4.0 from the Sloan Digital Sky Survey (Data Release 16), we identify ~6600 LLSs with N(HI)>10^17.5 cm^-2 between z~3.1-4.0 with a completeness and purity of >90% for the classification of LLSs. Using this sample, we measure a number of LLSs per unit redshift of 2.32…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
