Identifying galaxies, quasars, and stars with machine learning: A new   catalogue of classifications for 111 million SDSS sources without spectra

A. O. Clarke; A. M. M. Scaife; R. Greenhalgh; V. Griguta

arXiv:1909.10963·astro-ph.GA·July 15, 2020

Identifying galaxies, quasars, and stars with machine learning: A new catalogue of classifications for 111 million SDSS sources without spectra

A. O. Clarke, A. M. M. Scaife, R. Greenhalgh, V. Griguta

PDF

1 Repo

TL;DR

This paper develops a machine learning classifier trained on SDSS data to accurately categorize 111 million celestial sources into galaxies, quasars, and stars, significantly expanding the catalog of classified objects without requiring spectra.

Contribution

The study introduces an optimized random forest model and a comprehensive catalog of classifications for unlabelled SDSS sources, utilizing photometry and transfer learning techniques.

Findings

01

Classified 111 million sources with high confidence probabilities.

02

Achieved strong agreement between UMAP visualizations and classifier labels.

03

Analyzed the impact of class imbalance and magnitude errors on classification performance.

Abstract

We used 3.1 million spectroscopically labelled sources from the Sloan Digital Sky Survey (SDSS) to train an optimised random forest classifier using photometry from the SDSS and the Widefield Infrared Survey Explorer (WISE). We applied this machine learning model to 111 million previously unlabelled sources from the SDSS photometric catalogue which did not have existing spectroscopic observations. Our new catalogue contains 50.4 million galaxies, 2.1 million quasars, and 58.8 million stars. We provide individual classification probabilities for each source, with 6.7 million galaxies (13%), 0.33 million quasars (15%), and 41.3 million stars (70%) having classification probabilities greater than 0.99; and 35.1 million galaxies (70%), 0.72 million quasars (34%), and 54.7 million stars (93%) having classification probabilities greater than 0.9. Precision, Recall, and F1 score were…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

informationcake/SDSS-ML
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.