Anomaly detection for machine learning redshifts applied to SDSS galaxies
Ben Hoyle, Markus Michael Rau, Kerstin Paech, Christopher Bonnett,, Stella Seitz, Jochen Weller

TL;DR
This paper demonstrates that removing anomalous training examples from SDSS galaxy data significantly improves machine learning redshift estimation accuracy, with up to 80% better statistics.
Contribution
It introduces an anomaly detection method to identify and remove unreliable training data, enhancing machine learning redshift predictions for galaxies.
Findings
Up to 80% improvement in redshift estimation statistics
Effective use of Elliptical Envelope technique for anomaly detection
Method to estimate contamination fraction in data samples
Abstract
We present an analysis of anomaly detection for machine learning redshift estimation. Anomaly detection allows the removal of poor training examples, which can adversely influence redshift estimates. Anomalous training examples may be photometric galaxies with incorrect spectroscopic redshifts, or galaxies with one or more poorly measured photometric quantity. We select 2.5 million 'clean' SDSS DR12 galaxies with reliable spectroscopic redshifts, and 6730 'anomalous' galaxies with spectroscopic redshift measurements which are flagged as unreliable. We contaminate the clean base galaxy sample with galaxies with unreliable redshifts and attempt to recover the contaminating galaxies using the Elliptical Envelope technique. We then train four machine learning architectures for redshift analysis on both the contaminated sample and on the preprocessed 'anomaly-removed' sample and measure…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
