GenSQL: A Probabilistic Programming System for Querying Generative Models of Database Tables
Mathieu Huot, Matin Ghavami, Alexander K. Lew, Ulrich Schaechtle,, Cameron E. Freer, Zane Shelby, Martin C. Rinard, Feras A. Saad, Vikash K., Mansinghka

TL;DR
GenSQL is a probabilistic programming system that extends SQL with primitives for querying generative models of database tables, enabling complex Bayesian inference workflows with improved accuracy, conciseness, and efficiency.
Contribution
It introduces a formalized probabilistic extension to SQL, supporting models from various languages, with proofs of soundness and demonstrated advantages in real-world data analysis tasks.
Findings
More accurate data modeling compared to baselines
More concise and less error-prone syntax
Significant speedup over competitors
Abstract
This article presents GenSQL, a probabilistic programming system for querying probabilistic generative models of database tables. By augmenting SQL with only a few key primitives for querying probabilistic models, GenSQL enables complex Bayesian inference workflows to be concisely implemented. GenSQL's query planner rests on a unified programmatic interface for interacting with probabilistic models of tabular data, which makes it possible to use models written in a variety of probabilistic programming languages that are tailored to specific workflows. Probabilistic models may be automatically learned via probabilistic program synthesis, hand-designed, or a combination of both. GenSQL is formalized using a novel type system and denotational semantics, which together enable us to establish proofs that precisely characterize its soundness guarantees. We evaluate our system on two case…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
