Probabilistic Query Evaluation with Bag Semantics
Martin Grohe, Peter Lindner, Christoph Standke

TL;DR
This paper investigates the complexity of evaluating conjunctive queries on probabilistic databases under bag semantics, extending known results from set semantics to more general, unbounded multiplicity models.
Contribution
It introduces a framework for probabilistic query evaluation with bag semantics, establishing complexity results and a dichotomy for self-join free conjunctive queries.
Findings
Expectations of answer tuple multiplicities are efficiently computable.
A complexity dichotomy exists: some queries are polynomial-time solvable, others are #P-hard.
The paper extends known set semantics results to bag semantics with unbounded multiplicities.
Abstract
We study the complexity of evaluating queries on probabilistic databases under bag semantics. We focus on self-join free conjunctive queries, and probabilistic databases where occurrences of different facts are independent, which is the natural generalization of tuple-independent probabilistic databases to the bag semantics setting. For set semantics, the data complexity of this problem is well understood, even for the more general class of unions of conjunctive queries: it is either in polynomial time, or #P-hard, depending on the query (Dalvi & Suciu, JACM 2012). A reasonably general model of bag probabilistic databases may have unbounded multiplicities. In this case, the probabilistic database is no longer finite, and a careful treatment of representation mechanisms is required. Moreover, the answer to a Boolean query is a probability distribution over (possibly all) non-negative…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
