Declarative Statistical Modeling with Datalog
Vince Barany, Balder ten Cate, Benny Kimelfeld, Dan Olteanu, and Zografoula Vagena

TL;DR
This paper introduces a declarative framework based on an extension of Datalog for specifying statistical models integrated with databases, enabling probabilistic reasoning with a robust semantics and natural incorporation of observations.
Contribution
It extends Datalog to include numerical probability functions, providing a natural, declarative way to specify statistical models with formal semantics and handling of complex outcome spaces.
Findings
Framework supports probabilistic modeling over databases.
Semantics based on cylinder sets ensures robustness.
Conditions identified for finite outcome spaces.
Abstract
Formalisms for specifying statistical models, such as probabilistic-programming languages, typically consist of two components: a specification of a stochastic process (the prior), and a specification of observations that restrict the probability space to a conditional subspace (the posterior). Use cases of such formalisms include the development of algorithms in machine learning and artificial intelligence. We propose and investigate a declarative framework for specifying statistical models on top of a database, through an appropriate extension of Datalog. By virtue of extending Datalog, our framework offers a natural integration with the database, and has a robust declarative semantics. Our Datalog extension provides convenient mechanisms to include numerical probability functions; in particular, conclusions of rules may contain values drawn from such functions. The semantics of a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
