# Matrix Completion for Survey Data Prediction with Multivariate   Missingness

**Authors:** Xiaojun Mao, Zhonglei Wang, Shu Yang

arXiv: 1907.08360 · 2019-08-06

## TL;DR

This paper introduces a novel matrix completion-based imputation method for survey data with multivariate missingness, leveraging both row and column patterns for improved accuracy, demonstrated through simulations and NHANES data application.

## Contribution

It develops a new matrix completion approach that treats the data matrix holistically, incorporating covariates and a low-rank residual, with a projection strategy for parameter identification.

## Key findings

- Doubly robust estimator achieves lower mean squared error.
- Method outperforms existing imputation schemes in simulations.
- Applied successfully to NHANES data.

## Abstract

The National Health and Nutrition Examination Survey (NHANES) studies the nutritional and health status over the whole U.S. population with comprehensive physical examinations and questionnaires. However, survey data analyses become challenging due to inevitable missingness in almost all variables. In this paper, we develop a new imputation method to deal with multivariate missingness at random using matrix completion. In contrast to existing imputation schemes either conducting row-wise or column-wise imputation, we treat the data matrix as a whole which allows exploiting both row and column patterns to impute the missing values in the whole data matrix at one time. We adopt a column-space-decomposition model for the population data matrix with easy-to-obtain demographic data as covariates and a low-rank structured residual matrix. A unique challenge arises due to lack of identification of parameters in the sample data matrix. We propose a projection strategy to uniquely identify the parameters and corresponding penalized estimators, which are computationally efficient and possess desired statistical properties. The simulation study shows that the doubly robust estimator using the proposed matrix completion for imputation has smaller mean squared error than other competitors. To demonstrate practical relevance, we apply the proposed method to the 2015-2016 NHANES Questionnaire Data.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1907.08360/full.md

## References

53 references — full list in the complete paper: https://tomesphere.com/paper/1907.08360/full.md

---
Source: https://tomesphere.com/paper/1907.08360