The probabilistic random forest applied to the QUBRICS survey: improving the selection of high-redshift quasars with synthetic data
Francesco Guarneri, Giorgio Calderone, Stefano Cristiani, Matteo, Porru, Fabio Fontanot, Konstantina Boutsia, Guido Cupani, Andrea Grazian,, Valentina D'Odorico, Michael T. Murphy, Angela Bongiorno, Ivano Saccheo,, Luciano Nicastro

TL;DR
This paper demonstrates that training a probabilistic random forest with synthetic data and using color features enhances the selection of high-redshift quasars in the QUBRICS survey, achieving high completeness and confirmed candidates.
Contribution
It introduces the use of synthetic data for training the probabilistic random forest to improve high-z quasar selection in large surveys.
Findings
Synthetic data significantly improves the PRF's performance.
Using colors as features slightly outperforms magnitudes.
High success rate in identifying genuine high-z quasars.
Abstract
Several recent works have focused on the search for bright, high-z quasars (QSOs) in the South. Among them, the QUasars as BRIght beacons for Cosmology in the Southern hemisphere (QUBRICS) survey has now delivered hundreds of new spectroscopically confirmed QSOs selected by means of machine learning algorithms. Building upon the results obtained by introducing the probabilistic random forest (PRF) for the QUBRICS selection, we explore in this work the feasibility of training the algorithm on synthetic data to improve the completeness in the higher redshift bins. We also compare the performances of the algorithm if colours are used as primary features instead of magnitudes. We generate synthetic data based on a composite QSO spectral energy distribution. We first train the PRF to identify QSOs among stars and galaxies, then separate high-z quasar from low-z contaminants. We apply the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTime Series Analysis and Forecasting · Galaxies: Formation, Evolution, Phenomena
