Using Multivariate Imputation by Chained Equations to Predict Redshifts of Active Galactic Nuclei
Spencer James Gibson, Aditya Narendra, Maria Giovanna Dainotti,, Malgorzata Bogdan, Agniezska Pollo, Artem Poliszczuk, Enrico Rinaldi and, Ioannis Liodakis

TL;DR
This paper demonstrates that using the MICE imputation method to handle missing data significantly improves machine learning models' ability to predict the redshifts of active galactic nuclei from large survey data.
Contribution
It introduces the application of MICE for imputing missing data in AGN redshift prediction, enhancing ML model accuracy in astronomical datasets.
Findings
MICE effectively imputes 24% missing data in the 4LAC catalog.
Imputation with MICE improves ML redshift estimation accuracy.
The approach facilitates better utilization of incomplete astronomical data.
Abstract
Redshift measurement of active galactic nuclei (AGNs) remains a time-consuming and challenging task, as it requires follow up spectroscopic observations and detailed analysis. Hence, there exists an urgent requirement for alternative redshift estimation techniques. The use of machine learning (ML) for this purpose has been growing over the last few years, primarily due to the availability of large-scale galactic surveys. However, due to observational errors, a significant fraction of these data sets often have missing entries, rendering that fraction unusable for ML regression applications. In this study, we demonstrate the performance of an imputation technique called Multivariate Imputation by Chained Equations (MICE), which rectifies the issue of missing data entries by imputing them using the available information in the catalog. We use the Fermi-LAT Fourth Data Release Catalog…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical and numerical algorithms · Data Analysis with R
