Optimizing Photometric Redshift Training Sets I: Efficient Compression of the Galaxy Color-Redshift Relation with UMAP
Finian Ashmead, Jeffrey A. Newman, Brett H. Andrews, Rachel Bezanson, Biprateep Dey, Daniel C. Masters, and S.A. Stanford

TL;DR
This paper introduces a novel approach using UMAP to compress galaxy color space for improved photometric redshift estimation, enabling better interpolation and handling of biased spectroscopic samples.
Contribution
The study demonstrates that UMAP-based compression combined with nearest neighbors enhances photo-$z$ accuracy and robustness, especially with biased spectroscopic training data.
Findings
UMAP compression creates a continuous, monotonic manifold in redshift and star formation rate.
UMAP-$k$NN-$z$ outperforms SOM-$z$ in reducing scatter and outliers.
Interpolating within the UMAP space improves photo-$z$ estimates in sparsely sampled regions.
Abstract
Spectroscopic datasets are essential for training and calibrating photometric redshift (photo-) methods. However, spectroscopic redshifts (spec-'s) constitute a biased and sparse sampling of the photometric galaxy population, which creates difficulties for the common grid-based approach for mapping color to redshift using self-organizing maps (SOMs). Instead, we utilized the uniform manifold approximation and projection (UMAP) algorithm to compress a Rubin-Roman-like color space into a thin and densely-sampled manifold. Crucially, the manifold varies continuously and monotonically in redshift and specific star formation rate in roughly orthogonal directions. Using 110,000 COSMOS2020 many-band photo-'s and 15,000 spec-'s as representative and non-representative samples, respectively, we trained and tested redshift estimation from a SOM (SOM-) and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGalaxies: Formation, Evolution, Phenomena · Gaussian Processes and Bayesian Inference · Machine Learning and Data Classification
