Independence in Infinite Probabilistic Databases

Martin Grohe; Peter Lindner

arXiv:2011.00096·cs.DB·June 1, 2022

Independence in Infinite Probabilistic Databases

Martin Grohe, Peter Lindner

PDF

Open Access

TL;DR

This paper develops the mathematical foundations of infinite probabilistic databases, focusing on independence assumptions, and introduces new models and algorithms for handling infinite and open-world scenarios.

Contribution

It extends probabilistic database theory to infinite domains, studies independence in such settings, and proposes new models and approximate query algorithms.

Findings

01

Established mathematical foundations for infinite PDBs.

02

Analyzed independence in infinite and uncountable fact spaces.

03

Proposed an approximate query answering algorithm for countable PDBs.

Abstract

Probabilistic databases (PDBs) model uncertainty in data. The current standard is to view PDBs as finite probability spaces over relational database instances. Since many attributes in typical databases have infinite domains, such as integers, strings, or real numbers, it is often more natural to view PDBs as infinite probability spaces over database instances. In this paper, we lay the mathematical foundations of infinite probabilistic databases. Our focus then is on independence assumptions. Tuple-independent PDBs play a central role in theory and practice of PDBs. Here, we study infinite tuple-independent PDBs as well as related models such as infinite block-independent disjoint PDBs. While the standard model of PDBs focuses on a set-based semantics, we also study tuple-independent PDBs with a bag semantics and independence in PDBs over uncountable fact spaces. We also propose a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsLogic, Reasoning, and Knowledge · Bayesian Modeling and Causal Inference · Data Management and Algorithms