Data augmentation for machine learning redshifts applied to SDSS   galaxies

Ben Hoyle; Markus Michael Rau; Christopher Bonnett; Stella Seitz,; Jochen Weller

arXiv:1501.06759·astro-ph.CO·June 23, 2015

Data augmentation for machine learning redshifts applied to SDSS galaxies

Ben Hoyle, Markus Michael Rau, Christopher Bonnett, Stella Seitz,, Jochen Weller

PDF

TL;DR

This paper demonstrates that data augmentation significantly improves machine learning redshift estimates for SDSS galaxies by reducing errors and outliers, especially when training data is biased or limited.

Contribution

It introduces a novel data augmentation approach using simulations and K-corrections to enhance redshift estimation accuracy in biased training samples.

Findings

01

Reduces redshift error by 40% with augmentation.

02

Decreases outlier fraction by up to 80%.

03

Maintains negligible bias across magnitudes.

Abstract

We present analyses of data augmentation for machine learning redshift estimation. Data augmentation makes a training sample more closely resemble a test sample, if the two base samples differ, in order to improve measured statistics of the test sample. We perform two sets of analyses by selecting 800k (1.7M) SDSS DR8 (DR10) galaxies with spectroscopic redshifts. We construct a base training set by imposing an artificial r band apparent magnitude cut to select only bright galaxies and then augment this base training set by using simulations and by applying the K-correct package to artificially place training set galaxies at a higher redshift. We obtain redshift estimates for the remaining faint galaxy sample, which are not used during training. We find that data augmentation reduces the error on the recovered redshifts by 40% in both sets of analyses, when compared to the difference…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.