Leaves on trees: identifying halo stars with extreme gradient boosted trees
Jovan Veljanoski, Amina Helmi, Maarten Breddels, Lorenzo Posti

TL;DR
This paper develops a machine learning classifier using Gradient Boosted Trees to identify halo stars in Gaia data, achieving high accuracy even with limited phase-space information, aiding understanding of galaxy formation.
Contribution
The study introduces a novel supervised classifier trained on Gaia simulations and catalog cross-matches to reliably identify halo stars with limited data.
Findings
Achieves 90% recovery of halo stars with full phase-space data
Detects 337 high-confidence halo stars in TGAS data
Performance degrades with large parallax errors
Abstract
Extended stellar haloes are a natural by-product of the hierarchical formation of massive galaxies. If merging is a non-negligible factor in the growth of our Galaxy, evidence of such events should be encoded in its stellar halo. Reliable identification of genuine halo stars is a challenging task however. The 1st Gaia data release contains the positions, parallaxes and proper motions for over 2 million stars, mostly in the Solar neighbourhood. Gaia DR2 will enlarge this sample to over 1.5 billion stars, the brightest ~5 million of which will have a full phase-space information. Our aim is to develop a machine learning model to reliably identify halo stars, even when their full phase-space information is not available. We use the Gradient Boosted Trees algorithm to build a supervised halo star classifier. The classifier is trained on a sample extracted from the Gaia Universe Model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
