ZaliQL: A SQL-Based Framework for Drawing Causal Inference from Big Data
Babak Salimi, Dan Suciu

TL;DR
ZaliQL introduces a SQL-based framework that enables scalable causal inference directly within database systems, supporting advanced methods and optimizations for large observational datasets.
Contribution
It presents a novel suite of SQL techniques for causal inference that scale to big data and incorporate optimization strategies for improved performance.
Findings
Supports state-of-the-art causal inference methods in SQL
Achieves significant speedups with optimization techniques
Effectively handles large observational datasets
Abstract
Causal inference from observational data is a subject of active research and development in statistics and computer science. Many toolkits have been developed for this purpose that depends on statistical software. However, these toolkits do not scale to large datasets. In this paper we describe a suite of techniques for expressing causal inference tasks from observational data in SQL. This suite supports the state-of-the-art methods for causal inference and run at scale within a database engine. In addition, we introduce several optimization techniques that significantly speedup causal inference, both in the online and offline setting. We evaluate the quality and performance of our techniques by experiments of real datasets.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Modeling and Causal Inference · Data Quality and Management · Scientific Computing and Data Management
MethodsCausal inference
