Missing Data Imputation for Galaxy Redshift Estimation

Kieran J. Luken; Rabina Padhy; X. Rosalind Wang

arXiv:2111.13806·astro-ph.IM·November 30, 2021·1 cites

Missing Data Imputation for Galaxy Redshift Estimation

Kieran J. Luken, Rabina Padhy, X. Rosalind Wang

PDF

Open Access 1 Repo

TL;DR

This paper evaluates various data imputation methods, including simple and complex algorithms, for handling missing data in galaxy redshift estimation, finding MICE to be most effective in reducing prediction error.

Contribution

It compares multiple imputation techniques for astronomical data and demonstrates that MICE yields the lowest prediction error in galaxy redshift estimation.

Findings

01

MICE achieves the lowest Root Mean Square Error.

02

GAIN performs better than simple methods but worse than MICE.

03

Imputation improves redshift prediction accuracy.

Abstract

Astronomical data is full of holes. While there are many reasons for this missing data, the data can be randomly missing, caused by things like data corruptions or unfavourable observing conditions. We test some simple data imputation methods(Mean, Median, Minimum, Maximum and k-Nearest Neighbours (kNN)), as well as two more complex methods (Multivariate Imputation by using Chained Equation (MICE) and Generative Adversarial Imputation Network (GAIN)) against data where increasing amounts are randomly set to missing. We then use the imputed datasets to estimate the redshift of the galaxies, using the kNN and Random Forest ML techniques. We find that the MICE algorithm provides the lowest Root Mean Square Error and consequently the lowest prediction error, with the GAIN algorithm the next best.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

kluken/redshift_imputation
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Statistical Methods and Models · Galaxies: Formation, Evolution, Phenomena · Gaussian Processes and Bayesian Inference