Topology-Aware Subset Repair via Entropy-Guided Density and Graph Decomposition
Guoqi Zhao, Xixian Han, Xiaolong Wan

TL;DR
This paper introduces a topology-aware subset repair framework that combines density, conflict analysis, and graph decomposition to improve data cleaning accuracy and efficiency.
Contribution
It proposes a novel joint density-conflict penalty model with topology-aware conflict detection and dynamic attribute weighting, along with scalable algorithms for subset repair.
Findings
Improves repair accuracy and robustness.
Reduces homogeneity bias in density-based methods.
Enhances scalability through graph decomposition.
Abstract
Subset repair is an important data cleaning technique that enforces integrity constraints by deleting a minimal number of conflicting tuples, yet multiple minimal repairs often exist. Density-based methods address this ambiguity by favoring repairs that preserve dense, high-quality data regions; however, their effectiveness is limited by density bias from dirty clusters, high computational cost, and uniform attribute weighting. We propose a topology-aware approximate subset repair framework based on a joint density-conflict penalty model. The framework integrates three key components. First, a two-layer conflict detection strategy combines attribute inverted indexes with CFD rule grouping to efficiently identify violations. Second, we introduce EntroCFDensity, a density metric that incorporates information entropy and CFD weights to dynamically adjust attribute importance and reduce…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Advanced Graph Neural Networks · Big Data and Digital Economy
