Comparative analysis of missing data imputation methods for CSST survey: Impact on photometric redshift estimation performance
Ling Wang, Zhu Chen, Zhijian Luo, Liping Fu, Zuhui Fan, Wei Du, Yaoming Lei, Zhang Ban, Yuedong Fang, Yi Hu, Xin Ji, Guoliang Li, Xiaobo Li, Jiaqi Lin, Chenxiaoji Ling, Chao Liu, Dezi Liu, Changqing Luo, Yu Luo, Bin Ma, Xianmin Meng, Jundan Nie, Juanjuan Ren, Li Shao

TL;DR
This study systematically evaluates data imputation methods for improving photometric redshift estimation in the CSST survey, highlighting the strengths of KNN and SAITS under various missing data scenarios.
Contribution
It benchmarks ML and DL imputation models, identifies the best performers, and discusses the importance of domain consistency and handling different missingness mechanisms.
Findings
KNN performs best under ideal MCAR conditions with complete training data.
SAITS outperforms KNN when training data is incomplete or in realistic mixed missingness scenarios.
Domain mismatch between training and testing data degrades imputation performance.
Abstract
Improving the accuracy of photometric redshifts (photo-) is essential for reliable statistical studies of cosmology and galaxy evolution. However, missing photometric bands are a common observational challenge that can significantly degrade photo- estimation accuracy. In this work, we present a systematic evaluation of data imputation methods aimed at improving photo- performance. We benchmark a range of representative machine learning (ML) and deep learning (DL) architectures, identifying k-nearest neighbors (KNN) and the attention-based SAITS model as the leading performers. These models are then applied to China Space Station Survey Telescope (CSST) mock data to assess their performance under realistic observational conditions. Our results show that KNN yields the highest accuracy under idealized missing completely at random (MCAR) conditions with complete training sets,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
