Machine Learning-based Search of High-redshift Quasars
Guangping Ye, Huanian Zhang, Qingwen Wu

TL;DR
This paper develops a machine learning approach, particularly a random forest model, to identify high-redshift quasars from large survey data, achieving high precision and recall, and provides a publicly available catalog of candidates.
Contribution
The study introduces an effective machine learning pipeline, including imputation and ensemble techniques, to improve high-redshift quasar detection and provides a large candidate catalog with validation.
Findings
Random forest achieves 96.43% precision and 91.53% recall.
High completeness of 82.20% for high-redshift quasars.
Catalog of 216,949 candidates with 476 high-probability quasars.
Abstract
We present a machine learning search for high-redshift () quasars using the combined photometric data from the DESI Imaging Legacy Surveys and the WISE survey. We explore the imputation of missing values for high-redshift quasars, discuss the feature selections, compare different machine learning algorithms, and investigate the selections of class ensemble for the training sample, then we find that the random forest model is very effective in separating the high-redshift quasars from various contaminators. The 11-class random forest model can achieve a precision of and a recall of for high-redshift quasars for the test set. We demonstrate that the completeness of the high-redshift quasars can reach as high as . The final catalog consists of 216,949 high-redshift quasar candidates with 476 high probable ones in the entire Legacy Surveys DR9…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression
