Calibration: A Simple Trick for Wide-table Delta Analytics
Zezhou Huang, Eugene Wu

TL;DR
This paper introduces Calibrated Junction Hypertree (CJT), a novel data structure that significantly accelerates wide-table delta analytics by enabling flexible message passing and incremental updates, outperforming existing methods by up to 10^5 times.
Contribution
The paper proposes CJT, a new data structure inspired by probabilistic graphical models, to optimize wide-table delta analytics with fast construction, message reuse, and incremental maintenance.
Findings
CJT achieves 30x to 10^5x speedups over state-of-the-art algorithms.
CJT is effective across multiple platforms including cloud DBs and Pandas.
CJT benefits applications like OLAP, query explanation, streaming, and ML data augmentation.
Abstract
Data analytics over normalized databases typically requires computing and materializing expensive joins (wide-tables). Factorized query execution models execution as message passing between relations in the join graph and pushes aggregations through joins to reduce intermediate result sizes. Although this accelerates query execution, it only optimizes a single wide-table query. In contrast, wide-table analytics is usually interactive and users want to apply delta to the initial query structure. For instance, users want to slice, dice and drill-down dimensions, update part of the tables and join with new tables for enrichment. Such Wide-table Delta Analytics offers novel work-sharing opportunities. This work shows that carefully materializing messages during query execution can accelerate Wide-table Delta Analytics by >10^5x as compared to factorized execution, and only incurs a constant…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReservoir Engineering and Simulation Methods · Computational Physics and Python Applications
