A Step Toward Deep Online Aggregation (Extended Version)
Nikhil Sheoran, Supawit Chockchowwat, Arav Chheda, Suwen Wang, Riya, Verma, Yongjoo Park

TL;DR
This paper introduces evolving data frames (edf), a novel data model enabling online aggregation for nested operations, allowing more interactive and complex data exploration with faster initial estimates.
Contribution
It proposes a new data model, edf, that supports online aggregation for nested operations, extending the capabilities of existing online aggregation systems.
Findings
Wake produces initial estimates 4.93x faster than traditional systems
Wake achieves 1.3x median slowdown for exact answers
Wake is 1.92x faster than existing OLA systems for high-precision estimates
Abstract
For exploratory data analysis, it is often desirable to know what answers you are likely to get before actually obtaining those answers. This can potentially be achieved by designing systems to offer the estimates of a data operation result -- say op(data) -- earlier in the process based on partial data processing. Those estimates continuously refine as more data is processed and finally converge to the exact answer. Unfortunately, the existing techniques -- called Online Aggregation (OLA) -- are limited to a single operation; that is, we cannot obtain the estimates for op(op(data)) or op(...(op(data))). If this Deep OLA becomes possible, data analysts will be able to explore data more interactively using complex cascade operations. In this work, we take a step toward Deep OLA with evolving data frames (edf), a novel data model to offer OLA for nested ops -- op(...(op(data))) -- by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Stream Mining Techniques · Machine Learning and Data Classification · Time Series Analysis and Forecasting
