The Feasibility and Flexibility of Selecting Quasars by Variability Using Ensemble Machine Learning Algorithms
Da-Ming Yang, Zhang-Liang Xie, Jun-Xian Wang

TL;DR
This study demonstrates that ensemble machine learning algorithms can effectively select quasars based on variability data, achieving high precision and completeness, and are feasible even with limited observational time frames.
Contribution
The paper introduces the use of decision-tree ensemble algorithms for quasar selection based solely on variability parameters, outperforming color-based methods and showing efficiency with short-term data.
Findings
All three models achieved ~98.5% precision and 97.5% completeness.
Variability-based selection outperforms color-based methods in accuracy.
Quasar sample completeness in Stripe 82 is estimated at ~93%.
Abstract
In this work we train three decision-tree based ensemble machine learning algorithms (Random Forest Classifier, Adaptive Boosting and Gradient Boosting Decision Tree respectively) to study quasar selection in the variable source catalog in SDSS Stripe 82. We build training and test samples (both containing 1:1 of quasars and stars) using the spectroscopic confirmed sources in SDSS DR14 (including 8330 quasars and 3966 stars). We find that, trained with variation parameters alone, all three models can select quasars with similarly and remarkably high precision and completeness ( 98.5% and 97.5%), even better than trained with SDSS colors alone ( 97.2% and 96.5%), consistent with previous studies. Through applying the trained models on the variable sources without spectroscopic identifications, we estimate the spectroscopically confirmed quasar sample in Stripe 82 variable…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
