Bayes Networks for Supporting Query Processing Over Incomplete Autonomous Databases
Rohit Raghunathan, Sushovan De, Subbarao Kambhampati

TL;DR
This paper introduces a probabilistic approach using Bayes networks to improve query processing over incomplete autonomous databases, outperforming existing methods especially with multiple missing attribute values.
Contribution
It presents a novel Bayesian network-based method for imputation and query rewriting, addressing limitations of previous independence-assumption approaches like AFDs.
Findings
Bayes networks achieve higher accuracy with multiple missing attributes.
Reformulated queries using Bayes networks yield better precision and recall.
The approach maintains manageable query processing costs.
Abstract
As the information available to lay users through autonomous data sources continues to increase, mediators become important to ensure that the wealth of information available is tapped effectively. A key challenge that these information mediators need to handle is the varying levels of incompleteness in the underlying databases in terms of missing attribute values. Existing approaches such as QPIAD aim to mine and use Approximate Functional Dependencies (AFDs) to predict and retrieve relevant incomplete tuples. These approaches make independence assumptions about missing values---which critically hobbles their performance when there are tuples containing missing values for multiple correlated attributes. In this paper, we present a principled probabilistic alternative that views an incomplete tuple as defining a distribution over the complete tuples that it stands for. We learn this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Modeling and Causal Inference · Data Management and Algorithms · Data Quality and Management
