Explain3D: Explaining Disagreements in Disjoint Datasets

Xiaolan Wang; Alexandra Meliou

arXiv:1903.09246·cs.DB·March 25, 2019·1 cites

Explain3D: Explaining Disagreements in Disjoint Datasets

Xiaolan Wang, Alexandra Meliou

PDF

Open Access

TL;DR

Explain3D is a framework that identifies reasons for discrepancies in query results across disjoint datasets with different schemas, enhancing understanding of data disagreements in complex, real-world scenarios.

Contribution

It formalizes the problem of explaining query result differences over disjoint datasets and introduces a novel 3-stage framework with an efficient optimizer.

Findings

01

Efficiently derives precise explanations for dataset discrepancies.

02

Outperforms traditional schema matching in explaining query differences.

03

Demonstrates effectiveness on real-world and synthetic data.

Abstract

Data plays an important role in applications, analytic processes, and many aspects of human activity. As data grows in size and complexity, we are met with an imperative need for tools that promote understanding and explanations over data-related operations. Data management research on explanations has focused on the assumption that data resides in a single dataset, under one common schema. But the reality of today's data is that it is frequently un-integrated, coming from different sources with different schemas. When different datasets provide different answers to semantically similar questions, understanding the reasons for the discrepancies is challenging and cannot be handled by the existing single-dataset solutions. In this paper, we propose Explain3D, a framework for explaining the disagreements across disjoint datasets (3D). Explain3D focuses on identifying the reasons for the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Quality and Management · Scientific Computing and Data Management · Semantic Web and Ontologies