Anomaly detection for machine learning redshifts applied to SDSS   galaxies

Ben Hoyle; Markus Michael Rau; Kerstin Paech; Christopher Bonnett,; Stella Seitz; Jochen Weller

arXiv:1503.08214·astro-ph.CO·June 16, 2016

Anomaly detection for machine learning redshifts applied to SDSS galaxies

Ben Hoyle, Markus Michael Rau, Kerstin Paech, Christopher Bonnett,, Stella Seitz, Jochen Weller

PDF

TL;DR

This paper demonstrates that removing anomalous training examples from SDSS galaxy data significantly improves machine learning redshift estimation accuracy, with up to 80% better statistics.

Contribution

It introduces an anomaly detection method to identify and remove unreliable training data, enhancing machine learning redshift predictions for galaxies.

Findings

01

Up to 80% improvement in redshift estimation statistics

02

Effective use of Elliptical Envelope technique for anomaly detection

03

Method to estimate contamination fraction in data samples

Abstract

We present an analysis of anomaly detection for machine learning redshift estimation. Anomaly detection allows the removal of poor training examples, which can adversely influence redshift estimates. Anomalous training examples may be photometric galaxies with incorrect spectroscopic redshifts, or galaxies with one or more poorly measured photometric quantity. We select 2.5 million 'clean' SDSS DR12 galaxies with reliable spectroscopic redshifts, and 6730 'anomalous' galaxies with spectroscopic redshift measurements which are flagged as unreliable. We contaminate the clean base galaxy sample with galaxies with unreliable redshifts and attempt to recover the contaminating galaxies using the Elliptical Envelope technique. We then train four machine learning architectures for redshift analysis on both the contaminated sample and on the preprocessed 'anomaly-removed' sample and measure…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.