Diagnosing the Effects of Spectroscopic Training Set Imperfection on Photometric Redshift Performance

Alice Crafford; Alex I. Malz; Tianqing Zhang; Rachel Mandelbaum; Olivia Lynn; Federico Berlfein; Johann Cohen-Tanugi; John Franklin Crenshaw; Qianjun Hang; Irene Moskowitz; Drew Oldag; Samuel J. Schmidt; Ziang Yan; the LSST Dark Energy Science Collaboration

arXiv:2601.10797·astro-ph.IM·January 19, 2026

Diagnosing the Effects of Spectroscopic Training Set Imperfection on Photometric Redshift Performance

Alice Crafford, Alex I. Malz, Tianqing Zhang, Rachel Mandelbaum, Olivia Lynn, Federico Berlfein, Johann Cohen-Tanugi, John Franklin Crenshaw, Qianjun Hang, Irene Moskowitz, Drew Oldag, Samuel J. Schmidt, Ziang Yan, the LSST Dark Energy Science Collaboration

PDF

Open Access

TL;DR

This study evaluates how different types of spectroscopic training set imperfections influence photometric redshift estimation, identifying effective metrics for diagnosing biases and assessing algorithm robustness in astronomical surveys.

Contribution

It systematically tests multiple photo-$z$ algorithms against degraded training data to identify key metrics for diagnosing the impact of training set imperfections.

Findings

01

KL Divergence, Wasserstein Distance, and PIT are effective metrics for assessment.

02

Inverse redshift incompleteness alone does not fully capture real-world training data biases.

03

Certain algorithms show varying sensitivity to different types of training set degradation.

Abstract

Most LSST extragalactic science will rely on photometric redshifts (photo- $z$ ) to extract distance information for the galaxies. However, an incomplete or non-representative training set can introduce bias into photo- $z$ estimation. It is necessary to understand how various forms of training set imperfection, such as incompleteness and non-trivial spectroscopic target selection, affect photo- $z$ estimation algorithms, and to identify metrics best-suited to quantify the impact. This work aims to systematically study metrics for diagnosing how various photo- $z$ methods react to certain types of training set incompleteness and non-representativeness. We use methods available through the open-source Python library Redshift Assessment Infrastructure Layers (RAIL) to systematically test the algorithms CMNN, GPz, FlexZBoost, and PZFlow on mock training data degraded in accordance with several…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGalaxies: Formation, Evolution, Phenomena · Astronomy and Astrophysical Research · Gaussian Processes and Bayesian Inference