Solving the "many variables" problem in MICE with principal component   regression

Edoardo Costantini; Kyle M. Lang; Klaas Sijtsma; Tim Reeskens

arXiv:2206.15107·stat.ME·April 24, 2023

Solving the "many variables" problem in MICE with principal component regression

Edoardo Costantini, Kyle M. Lang, Klaas Sijtsma, Tim Reeskens

PDF

Open Access 2 Repos

TL;DR

This paper introduces a PCR-based method within MICE to automatically handle the challenge of many predictor variables in large social science datasets, showing competitive performance through simulations and case studies.

Contribution

It proposes using principal component regression as an automatic predictor selection method in MICE for high-dimensional data imputation.

Findings

01

PCR-based MICE performs best among tested methods.

02

PCR can match expert-designed imputation procedures.

03

Method is effective in large social science datasets.

Abstract

Multiple Imputation (MI) is one of the most popular approaches to addressing missing values in questionnaires and surveys. MI with multivariate imputation by chained equations (MICE) allows flexible imputation of many types of data. In MICE, for each variable under imputation, the imputer needs to specify which variables should act as predictors in the imputation model. The selection of these predictors is a difficult, but fundamental, step in the MI procedure, especially when there are many variables in a data set. In this project, we explore the use of principal component regression (PCR) as a univariate imputation method in the MICE algorithm to automatically address the "many variables" problem that arises when imputing large social science data. We compare different implementations of PCR-based MICE with a correlation-thresholding strategy by means of a Monte Carlo simulation study…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSensory Analysis and Statistical Methods