
TL;DR
This paper extends probabilistic databases to infinite and uncountable spaces using finite point processes, ensuring well-defined semantics for queries in more complex probabilistic models.
Contribution
It introduces a systematic way to model infinite probabilistic databases with uncountable spaces using finite point processes, addressing foundational measurability issues.
Findings
Measurability of relational algebra queries established
Aggregate and Datalog queries are shown to be measurable
Framework supports modeling of continuous probability distributions in PDBs
Abstract
Probabilistic databases (PDBs) model uncertainty in data in a quantitative way. In the established formal framework, probabilistic (relational) databases are finite probability spaces over relational database instances. This finiteness can clash with intuitive query behavior (Ceylan et al., KR 2016), and with application scenarios that are better modeled by continuous probability distributions (Dalvi et al., CACM 2009). We formally introduced infinite PDBs in (Grohe and Lindner, PODS 2019) with a primary focus on countably infinite spaces. However, an extension beyond countable probability spaces raises nontrivial foundational issues concerned with the measurability of events and queries and ultimately with the question whether queries have a well-defined semantics. We argue that finite point processes are an appropriate model from probability theory for dealing with general…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
