Aggregation in Probabilistic Databases via Knowledge Compilation
Robert Fink, Larisa Han, Dan Olteanu

TL;DR
This paper introduces a novel query evaluation method for probabilistic databases that uses knowledge compilation into decomposition trees, enabling efficient computation of aggregate queries.
Contribution
It develops a technique that compiles semiring and semimodule expressions into decomposition trees, facilitating linear-time probability distribution computation for aggregate queries.
Findings
Prototype integrated into SPROUT database engine
Demonstrated efficiency on custom and TPC-H datasets
Identified syntactic conditions for tractable aggregate queries
Abstract
This paper presents a query evaluation technique for positive relational algebra queries with aggregates on a representation system for probabilistic data based on the algebraic structures of semiring and semimodule. The core of our evaluation technique is a procedure that compiles semimodule and semiring expressions into so-called decomposition trees, for which the computation of the probability distribution can be done in time linear in the product of the sizes of the probability distributions represented by its nodes. We give syntactic characterisations of tractable queries with aggregates by exploiting the connection between query tractability and polynomial-time decomposition trees. A prototype of the technique is incorporated in the probabilistic database engine SPROUT. We report on performance experiments with custom datasets and TPC-H data.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Management and Algorithms · Advanced Database Systems and Queries · Semantic Web and Ontologies
