Aggregation in Probabilistic Databases via Knowledge Compilation

Robert Fink; Larisa Han; Dan Olteanu

arXiv:1201.6569·cs.DB·February 1, 2012·2 cites

Aggregation in Probabilistic Databases via Knowledge Compilation

Robert Fink, Larisa Han, Dan Olteanu

PDF

Open Access

TL;DR

This paper introduces a novel query evaluation method for probabilistic databases that uses knowledge compilation into decomposition trees, enabling efficient computation of aggregate queries.

Contribution

It develops a technique that compiles semiring and semimodule expressions into decomposition trees, facilitating linear-time probability distribution computation for aggregate queries.

Findings

01

Prototype integrated into SPROUT database engine

02

Demonstrated efficiency on custom and TPC-H datasets

03

Identified syntactic conditions for tractable aggregate queries

Abstract

This paper presents a query evaluation technique for positive relational algebra queries with aggregates on a representation system for probabilistic data based on the algebraic structures of semiring and semimodule. The core of our evaluation technique is a procedure that compiles semimodule and semiring expressions into so-called decomposition trees, for which the computation of the probability distribution can be done in time linear in the product of the sizes of the probability distributions represented by its nodes. We give syntactic characterisations of tractable queries with aggregates by exploiting the connection between query tractability and polynomial-time decomposition trees. A prototype of the technique is incorporated in the probabilistic database engine SPROUT. We report on performance experiments with custom datasets and TPC-H data.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Management and Algorithms · Advanced Database Systems and Queries · Semantic Web and Ontologies