Explain3D: Explaining Disagreements in Disjoint Datasets
Xiaolan Wang, Alexandra Meliou

TL;DR
Explain3D is a framework that identifies reasons for discrepancies in query results across disjoint datasets with different schemas, enhancing understanding of data disagreements in complex, real-world scenarios.
Contribution
It formalizes the problem of explaining query result differences over disjoint datasets and introduces a novel 3-stage framework with an efficient optimizer.
Findings
Efficiently derives precise explanations for dataset discrepancies.
Outperforms traditional schema matching in explaining query differences.
Demonstrates effectiveness on real-world and synthetic data.
Abstract
Data plays an important role in applications, analytic processes, and many aspects of human activity. As data grows in size and complexity, we are met with an imperative need for tools that promote understanding and explanations over data-related operations. Data management research on explanations has focused on the assumption that data resides in a single dataset, under one common schema. But the reality of today's data is that it is frequently un-integrated, coming from different sources with different schemas. When different datasets provide different answers to semantically similar questions, understanding the reasons for the discrepancies is challenging and cannot be handled by the existing single-dataset solutions. In this paper, we propose Explain3D, a framework for explaining the disagreements across disjoint datasets (3D). Explain3D focuses on identifying the reasons for the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Scientific Computing and Data Management · Semantic Web and Ontologies
