Unsupervised star, galaxy, qso classification: Application of HDBSCAN
Crispin Logan, Sotiria Fotopoulou

TL;DR
This paper presents an unsupervised machine learning approach using HDBSCAN for classifying stars, galaxies, and QSOs in large photometric datasets, achieving high accuracy without the need for labeled training data.
Contribution
The study introduces a novel unsupervised classification method with optimized hyperparameters and feature selection, demonstrating high accuracy and practical application to large astronomical surveys.
Findings
Achieved F1 scores of 98.9 for stars and galaxies, and 93.13 for QSOs.
Validated the method on SDSS data, correcting misclassifications.
Created a multiwavelength catalogue of 2.7 million sources with classifications.
Abstract
Classification will be an important first step for upcoming surveys that will detect billions of new sources such as LSST and Euclid, as well as DESI, 4MOST and MOONS. The application of traditional methods of model fitting and colour-colour selections will face significant computational constraints, while machine-learning (ML) methods offer a viable approach to tackle datasets of that volume. While supervised learning methods can perform very well for classification tasks, the creation of representative and accurate training sets is a resource and time consuming task. We present a viable alternative using an unsupervised ML method to separate stars, galaxies and QSOs using photometric data. The heart of our work uses HDBSCAN to find the star, galaxy and QSO clusters in a multidimensional colour space. We optimized the hyperparameters and input attributes of three separate HDBSCAN runs,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
