Expanding tidy data principles to facilitate missing data exploration,   visualization and assessment of imputations

Nicholas J Tierney; Dianne H Cook

arXiv:1809.02264·stat.CO·May 18, 2020·J. Stat. Softw.

Expanding tidy data principles to facilitate missing data exploration, visualization and assessment of imputations

Nicholas J Tierney, Dianne H Cook

PDF

TL;DR

This paper introduces a new framework based on tidy data principles to improve handling, exploration, visualization, and imputation of missing data, integrated into the R package `naniar`.

Contribution

It extends tidy data principles with a new data structure and operations specifically designed for missing data management and analysis.

Findings

01

Provides a connected framework for missing data exploration and imputation.

02

Introduces new data structure and operations for missing data handling.

03

Implemented in the R package `naniar`.

Abstract

Despite the large body of research on missing value distributions and imputation, there is comparatively little literature with a focus on how to make it easy to handle, explore, and impute missing values in data. This paper addresses this gap. The new methodology builds upon tidy data principles, with the goal of integrating missing value handling as a key part of data analysis workflows. We define a new data structure, and a suite of new operations. Together, these provide a connected framework for handling, exploring, and imputing missing values. These methods are available in the R package `naniar`.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.