Cluster membership analysis with supervised learning and $N$-body simulations
A. Bissekenov, M. Kalambay, E. Abdikamalov, X. Pang, P. Berczik, B., Shukirgaliyev

TL;DR
This study compares supervised machine learning models for star cluster membership analysis using simulated and real Gaia data, finding Random Forest performs best and that dataset balance and parameters influence accuracy.
Contribution
It systematically evaluates five ML models on star cluster data, highlighting the effectiveness of supervised learning and the impact of dataset characteristics.
Findings
Random Forest achieves slightly higher accuracy than other models.
Class balance is not critical for successful learning.
Astrometric parameters are more influential than photometric ones.
Abstract
Membership analysis is an important tool for studying star clusters. There are various approaches to membership determination, including supervised and unsupervised machine learning (ML) methods. We perform membership analysis using the supervised machine learning approach. We train and test our ML models on two sets of star cluster data: snapshots from -body simulations and 21 different clusters from the Gaia Data Release 3 data. We explore five different ML models: Random Forest (RF), Decision Trees, Support Vector Machines, Feed-Forward Neural Networks, and K-Nearest Neighbors. We find that all models produce similar results, with RF showing slightly better accuracy. We find that a balance of classes in datasets is optional for successful learning. The classification accuracy depends strongly on the astrometric parameters. The addition of photometric parameters does not improve…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
