Comparative analysis of missing data imputation methods for CSST survey: Impact on photometric redshift estimation performance

Ling Wang; Zhu Chen; Zhijian Luo; Liping Fu; Zuhui Fan; Wei Du; Yaoming Lei; Zhang Ban; Yuedong Fang; Yi Hu; Xin Ji; Guoliang Li; Xiaobo Li; Jiaqi Lin; Chenxiaoji Ling; Chao Liu; Dezi Liu; Changqing Luo; Yu Luo; Bin Ma; Xianmin Meng; Jundan Nie; Juanjuan Ren; Li Shao; Jianing Tang; Hao Tian; Feng Wang; Chengliang Wei; Peng Wei; Shoulin Wei; Kaichao Wu; You Wu; Yun-Ao Xiao; Zhou Xie; Yibo Yan; Su Yao; Yan Yu; Bo Zhang; Shengwen Zhang; Tianmeng Zhang; Xiaoli Zhang; Xin Zhang; Bowei Zhao; Zhimin Zhou; and Hu Zou

arXiv:2605.13219·astro-ph.GA·May 14, 2026

Comparative analysis of missing data imputation methods for CSST survey: Impact on photometric redshift estimation performance

Ling Wang, Zhu Chen, Zhijian Luo, Liping Fu, Zuhui Fan, Wei Du, Yaoming Lei, Zhang Ban, Yuedong Fang, Yi Hu, Xin Ji, Guoliang Li, Xiaobo Li, Jiaqi Lin, Chenxiaoji Ling, Chao Liu, Dezi Liu, Changqing Luo, Yu Luo, Bin Ma, Xianmin Meng, Jundan Nie, Juanjuan Ren, Li Shao

PDF

TL;DR

This study systematically evaluates data imputation methods for improving photometric redshift estimation in the CSST survey, highlighting the strengths of KNN and SAITS under various missing data scenarios.

Contribution

It benchmarks ML and DL imputation models, identifies the best performers, and discusses the importance of domain consistency and handling different missingness mechanisms.

Findings

01

KNN performs best under ideal MCAR conditions with complete training data.

02

SAITS outperforms KNN when training data is incomplete or in realistic mixed missingness scenarios.

03

Domain mismatch between training and testing data degrades imputation performance.

Abstract

Improving the accuracy of photometric redshifts (photo- $z$ ) is essential for reliable statistical studies of cosmology and galaxy evolution. However, missing photometric bands are a common observational challenge that can significantly degrade photo- $z$ estimation accuracy. In this work, we present a systematic evaluation of data imputation methods aimed at improving photo- $z$ performance. We benchmark a range of representative machine learning (ML) and deep learning (DL) architectures, identifying k-nearest neighbors (KNN) and the attention-based SAITS model as the leading performers. These models are then applied to China Space Station Survey Telescope (CSST) mock data to assess their performance under realistic observational conditions. Our results show that KNN yields the highest accuracy under idealized missing completely at random (MCAR) conditions with complete training sets,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.