# Improved photometric redshift estimations through self-organising map-based data augmentation

**Authors:** Yun-Hao Zhang, Joe Zuntz, Irene Moskowitz, Eric Gawiser, Konrad Kuijken, Marika Asgari, Henk Hoekstra, Alex I. Malz, Ziang Yan, Tianqing Zhang, The LSST Dark Energy Science Collaboration

arXiv: 2508.20903 · 2026-04-10

## TL;DR

This paper presents a SOM-based data augmentation framework that improves photometric redshift estimates by enhancing training datasets with simulated galaxies, especially at high redshifts, leading to reduced biases and failures.

## Contribution

The novel use of Self-Organising Maps for targeted data augmentation significantly enhances photometric redshift accuracy in upcoming large surveys.

## Key findings

- Reduced systematic biases in redshift estimates.
- Decreased catastrophic failure rates by up to a factor of 2.
- Improved robustness of density estimations for high-redshift galaxies.

## Abstract

We introduce a framework for the enhanced estimation of photometric redshifts using Self-Organising Maps (SOMs). Our method projects galaxy Spectral Energy Distributions (SEDs) onto a two-dimensional map, identifying regions that are sparsely sampled by existing spectroscopic observations. These under-sampled areas are then augmented with simulated galaxies, yielding a more representative spectroscopic training dataset. To assess the efficacy of this SOM-based data augmentation in the context of the forthcoming Legacy Survey of Space and Time (LSST), we employ mock galaxy catalogues from the OpenUniverse2024 project and generate synthetic datasets that mimic the expected photometric selections of LSST after one (Y1) and ten (Y10) years of observation. We construct 501 degraded realisations by sampling galaxy colours, magnitudes, redshifts and spectroscopic success rates, in order to emulate the compilation of a wide array of realistic spectroscopic surveys. Augmenting the degraded mock datasets with simulated galaxies from the independent CosmoDC2 catalogues has markedly improved the performance of our photometric redshift estimates compared to models lacking this augmentation, particularly for high-redshift galaxies ($z_\mathrm{true} \gtrsim 1.5$). This improvement is manifested in notably reduced systematic biases and a decrease in catastrophic failures by up to approximately a factor of 2, along with a reduction in information loss in the conditional density estimations. These results underscore the effectiveness of SOM-based augmentation in refining photometric redshift estimation, thereby enabling more robust analyses in cosmology and astrophysics for the NSF-DOE Vera C. Rubin Observatory.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/2508.20903/full.md

## Figures

28 figures with captions in the complete paper: https://tomesphere.com/paper/2508.20903/full.md

## References

4 references — full list in the complete paper: https://tomesphere.com/paper/2508.20903/full.md

---
Source: https://tomesphere.com/paper/2508.20903