Classifying Stars, Galaxies and AGN in CLAUDS+HSC-SSP Using Gradient Boosted Decision Trees
Anneya Golob, Marcin Sawicki, Andy D. Goulding, Jean Coupon

TL;DR
This paper presents a machine learning pipeline using Gradient Boosted Trees to classify stars, galaxies, and AGN in deep astronomical surveys, achieving high accuracy and demonstrating good generalization to fainter objects.
Contribution
The study develops and tests a GBT-based classification pipeline for multi-class astronomical objects, showing its effectiveness and potential for large survey data analysis.
Findings
Binary star/galaxy classification with AUC=0.9974
High purity (99.7%) and completeness (99.8%) for galaxy selection at i<25
Promising results for identifying Type I AGN, less so for Type II AGN
Abstract
Classifying catalog objects as stars, galaxies, or AGN is a crucial part of any statistical study of galaxies. We describe our pipeline for binary (star/galaxy) and multiclass (star/galaxy/Type I AGN/Type II AGN) classification developed for the very deep CLAUDS+HSC-SSP dataset. Our method uses the XGBoost implementation of Gradient Boosted Trees (GBT) to train ensembles of models which take photometry, colours, maximum surface brightnesses, and effective radii from all available bands as input, and output the probability that an object belongs to each of the classes under consideration. At our binary star/galaxy model has AUC=0.9974 and at the threshold that maximizes our sample's weighted F1 score, selects a sample of galaxies with 99.7% purity and 99.8% completeness. We test the model's ability to generalize to objects fainter than those seen during training…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
