ITI-IQA: a Toolbox for Heterogeneous Univariate and Multivariate Missing   Data Imputation Quality Assessment

Pedro Pons-Su\~ner; Laura Arnal; J.Ram\'on Navarro-Cerd\'an,; Fran\c{c}ois Signol

arXiv:2407.11767·cs.LG·July 17, 2024·2 cites

ITI-IQA: a Toolbox for Heterogeneous Univariate and Multivariate Missing Data Imputation Quality Assessment

Pedro Pons-Su\~ner, Laura Arnal, J.Ram\'on Navarro-Cerd\'an,, Fran\c{c}ois Signol

PDF

Open Access

TL;DR

The paper introduces ITI-IQA, a comprehensive toolbox for assessing and improving the quality of univariate and multivariate data imputation, ensuring more reliable handling of missing data in diverse data types.

Contribution

It presents a novel, trainable pipeline with statistical tests and diagnostic tools for selecting and validating imputation methods across various data types.

Findings

01

Supports continuous, discrete, binary, and categorical data.

02

Provides statistical evaluation to prevent bias.

03

Includes graphical tools for result verification.

Abstract

Missing values are a major challenge in most data science projects working on real data. To avoid losing valuable information, imputation methods are used to fill in missing values with estimates, allowing the preservation of samples or variables that would otherwise be discarded. However, if the process is not well controlled, imputation can generate spurious values that introduce uncertainty and bias into the learning process. The abundance of univariate and multivariate imputation techniques, along with the complex trade-off between data reliability and preservation, makes it difficult to determine the best course of action to tackle missing values. In this work, we present ITI-IQA (Imputation Quality Assessment), a set of utilities designed to assess the reliability of various imputation methods, select the best imputer for any feature or group of features, and filter out features…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Analysis with R

MethodsSparse Evolutionary Training