Dimensional Data KNN-Based Imputation
Yuzhao Yang (IRIT), J\'er\^ome Darmont (ERIC), Franck Ravat (IRIT),, Olivier Teste (IRIT)

TL;DR
This paper introduces a novel imputation method for data warehouse dimensions that combines hierarchical and KNN-based approaches, effectively considering warehouse structure and dependencies to improve data completeness.
Contribution
It presents a new dimension-specific imputation technique integrating hierarchical and KNN methods, addressing limitations of existing fact-focused approaches.
Findings
Effective in handling missing dimension data
Respects data warehouse structure and constraints
Demonstrates efficiency in experimental tests
Abstract
Data Warehouses (DWs) are core components of Business Intelligence (BI). Missing data in DWs have a great impact on data analyses. Therefore, missing data need to be completed. Unlike other existing data imputation methods mainly adapted for facts, we propose a new imputation method for dimensions. This method contains two steps: 1) a hierarchical imputation and 2) a k-nearest neighbors (KNN) based imputation. Our solution has the advantage of taking into account the DW structure and dependency constraints. Experimental assessments validate our method in terms of effectiveness and efficiency.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
