Machine learning based stellar classification with highly sparse photometry data
Sean Enis Cody, Sebastian Scher, Iain McDonald, Albert Zijlstra, Emma, Alexander, and Nick L.J. Cox

TL;DR
This paper explores using machine learning, specifically XGBoost combined with spectral-energy-distribution fitting, to classify stars into nine categories based on highly sparse photometric data, demonstrating initial feasibility despite accuracy limitations.
Contribution
It introduces a novel approach combining ML and spectral fitting for stellar classification with sparse data, addressing class imbalance and variable selection challenges.
Findings
Classifier accuracy ~0.7, macro F1 score 0.61
Increasing samples improves classification for specific star types
Variable choice impacts model performance depending on context
Abstract
Identifying stars belonging to different classes is vital in order to build up statistical samples of different phases and pathways of stellar evolution. In the era of surveys covering billions of stars, an automated method of identifying these classes becomes necessary. Many classes of stars are identified based on their emitted spectra. In this paper, we use a combination of the multi-class multi-label Machine Learning (ML) method XGBoost and the PySSED spectral-energy-distribution fitting algorithm to classify stars into nine different classes, based on their photometric data. The classifier is trained on subsets of the SIMBAD database. Particular challenges are the very high sparsity (large fraction of missing values) of the underlying data as well as the high class imbalance. We discuss the different variables available, such as photometric measurements on the one hand, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
