Quasar and galaxy classification in Gaia Data Release 2
Coryn A.L. Bailer-Jones, Morgan Fouesneau, Rene Andrae (Max Planck, Institute for Astronomy, Heidelberg)

TL;DR
This paper develops a Gaussian Mixture Model-based classifier to identify stars, quasars, and galaxies in Gaia DR2 data using only photometric and astrometric features, addressing class imbalance and applying it to billions of objects.
Contribution
It introduces a probabilistic classification method for Gaia DR2 data that accounts for class imbalance and applies it to classify millions of objects into stars, quasars, and galaxies.
Findings
Classified 2.3 million quasars and 0.37 million galaxies in Gaia DR2.
Purities of 0.43 for quasars and 0.28 for galaxies at threshold 0.5.
Estimated true counts of quasars and galaxies as 690,000 and 110,000.
Abstract
We construct a supervised classifier based on Gaussian Mixture Models to probabilistically classify objects in Gaia data release 2 (GDR2) using only photometric and astrometric data in that release. The model is trained empirically to classify objects into three classes -- star, quasar, galaxy -- for G<=14.5 mag down to the Gaia magnitude limit of G=21.0 mag. Galaxies and quasars are identified for the training set by a cross-match to objects with spectroscopic classifications from the Sloan Digital Sky Survey. Stars are defined directly from GDR2. When allowing for the expectation that quasars are 500 times rarer than stars, and galaxies 7500 times rarer than stars (the class imbalance problem), samples classified with a threshold probability of 0.5 are predicted to have purities of 0.43 for quasars and 0.28 for galaxies, and completenesses of 0.58 and 0.72 respectively. The purities…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
