Imputation procedures in surveys using nonparametric and machine   learning methods: an empirical comparison

Mehdi Dagdoug; Camelia Goga; David Haziza

arXiv:2007.06298·stat.ME·August 23, 2022

Imputation procedures in surveys using nonparametric and machine learning methods: an empirical comparison

Mehdi Dagdoug, Camelia Goga, David Haziza

PDF

TL;DR

This paper empirically compares various nonparametric and machine learning imputation methods for survey data, demonstrating their effectiveness in handling high-dimensional and complex datasets with item nonresponse.

Contribution

It provides an extensive empirical evaluation of machine learning-based imputation procedures, highlighting their advantages over traditional methods in diverse data settings.

Findings

01

Machine learning methods show low bias and high efficiency.

02

Some procedures outperform traditional imputation in complex, high-dimensional data.

03

Results support using advanced algorithms for survey imputation tasks.

Abstract

Nonparametric and machine learning methods are flexible methods for obtaining accurate predictions. Nowadays, data sets with a large number of predictors and complex structures are fairly common. In the presence of item nonresponse, nonparametric and machine learning procedures may thus provide a useful alternative to traditional imputation procedures for deriving a set of imputed values. In this paper, we conduct an extensive empirical investigation that compares a number of imputation procedures in terms of bias and efficiency in a wide variety of settings, including high-dimensional data sets. The results suggest that a number of machine learning procedures perform very well in terms of bias and efficiency.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.