Machine Learning Classification of Gaia Data Release 2
Yu Bai, JiFeng Liu, Song Wang

TL;DR
This paper applies machine learning classification to Gaia DR2 data combined with Pan-STARRS 1 and AllWISE, achieving high accuracy in distinguishing stars, galaxies, and QSOs, and providing insights into data quality and object classification.
Contribution
The study demonstrates the effectiveness of machine learning in classifying over 85 million Gaia DR2 objects with high accuracy, integrating multi-survey data for improved astrophysical object identification.
Findings
Classification accuracy of 91.9% across the dataset
Stars constitute approximately 98% of classified objects
A threshold of 0 < σπ/π < 0.2 yields a very clean stellar sample
Abstract
Machine learning has increasingly gained more popularity with its incredibly powerful ability to make predictions or calculated suggestions for large amounts of data. We apply the machine learning classification to 85,613,922 objects in the data release 2, based on the combination of the Pan-STARRS 1 and AllWISE data. The classification results are cross-matched with Simbad database, and the total accuracy is 91.9%. Our sample is dominated by stars, 98%, and galaxies makes up 2%. For the objects with negative parallaxes, about 2.5\% are galaxies and QSOs, while about 99.9% are stars if the relative parallax uncertainties are smaller than 0.2. Our result implies that using the threshold of 0 0.2 could yield a very clean stellar sample.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
