Machine learning based stellar classification with highly sparse   photometry data

Sean Enis Cody; Sebastian Scher; Iain McDonald; Albert Zijlstra; Emma; Alexander; and Nick L.J. Cox

arXiv:2410.22869·astro-ph.IM·October 31, 2024

Machine learning based stellar classification with highly sparse photometry data

Sean Enis Cody, Sebastian Scher, Iain McDonald, Albert Zijlstra, Emma, Alexander, and Nick L.J. Cox

PDF

TL;DR

This paper explores using machine learning, specifically XGBoost combined with spectral-energy-distribution fitting, to classify stars into nine categories based on highly sparse photometric data, demonstrating initial feasibility despite accuracy limitations.

Contribution

It introduces a novel approach combining ML and spectral fitting for stellar classification with sparse data, addressing class imbalance and variable selection challenges.

Findings

01

Classifier accuracy ~0.7, macro F1 score 0.61

02

Increasing samples improves classification for specific star types

03

Variable choice impacts model performance depending on context

Abstract

Identifying stars belonging to different classes is vital in order to build up statistical samples of different phases and pathways of stellar evolution. In the era of surveys covering billions of stars, an automated method of identifying these classes becomes necessary. Many classes of stars are identified based on their emitted spectra. In this paper, we use a combination of the multi-class multi-label Machine Learning (ML) method XGBoost and the PySSED spectral-energy-distribution fitting algorithm to classify stars into nine different classes, based on their photometric data. The classifier is trained on subsets of the SIMBAD database. Particular challenges are the very high sparsity (large fraction of missing values) of the underlying data as well as the high class imbalance. We discuss the different variables available, such as photometric measurements on the one hand, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.