Une nouvelle approche de compl\'etion des valeurs manquantes dans les   bases de donn\'ees

Leila Ben Othman

arXiv:1901.00671·cs.DB·January 4, 2019

Une nouvelle approche de compl\'etion des valeurs manquantes dans les bases de donn\'ees

Leila Ben Othman

PDF

Open Access

TL;DR

This paper introduces a novel method for imputing missing data in datasets by leveraging association rules, reducing conflicts, and improving accuracy in data completion tasks.

Contribution

It presents a new approach combining association rules with missing data imputation, introducing a robustness metric to select the most reliable rules.

Findings

01

Reduces conflicts during data completion

02

Achieves high accuracy in missing value imputation

03

Validated on benchmark datasets

Abstract

When tackling real-life datasets, it is common to face the existence of scrambled missing values within data. Considered as 'dirty data', usually it is removed during a pre-processing step. Starting from the fact that 'making up this missing data is better than throwing out it away', we present a new approach trying to complete missing data. The main singularity of the introduced approach is that it sheds light on a fruitful synergy between generic basis of association rules and the topic of missing values handling. In fact, beyond interesting compactness rate, such generic association rules make it possible to get a considerable reduction of conflicts during the completion step. A new metric called 'Robustness' is also introduced, and aims to select the robust association rule for the completion of a missing value whenever a conflict appears. Carried out experiments on benchmark…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Mining Algorithms and Applications · Rough Sets and Fuzzy Logic · Data Management and Algorithms