Data augmentation for machine learning redshifts applied to SDSS galaxies
Ben Hoyle, Markus Michael Rau, Christopher Bonnett, Stella Seitz,, Jochen Weller

TL;DR
This paper demonstrates that data augmentation significantly improves machine learning redshift estimates for SDSS galaxies by reducing errors and outliers, especially when training data is biased or limited.
Contribution
It introduces a novel data augmentation approach using simulations and K-corrections to enhance redshift estimation accuracy in biased training samples.
Findings
Reduces redshift error by 40% with augmentation.
Decreases outlier fraction by up to 80%.
Maintains negligible bias across magnitudes.
Abstract
We present analyses of data augmentation for machine learning redshift estimation. Data augmentation makes a training sample more closely resemble a test sample, if the two base samples differ, in order to improve measured statistics of the test sample. We perform two sets of analyses by selecting 800k (1.7M) SDSS DR8 (DR10) galaxies with spectroscopic redshifts. We construct a base training set by imposing an artificial r band apparent magnitude cut to select only bright galaxies and then augment this base training set by using simulations and by applying the K-correct package to artificially place training set galaxies at a higher redshift. We obtain redshift estimates for the remaining faint galaxy sample, which are not used during training. We find that data augmentation reduces the error on the recovered redshifts by 40% in both sets of analyses, when compared to the difference…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
