OmniSketch: Efficient Multi-Dimensional High-Velocity Stream Analytics with Arbitrary Predicates
Wieger R. Punter, Odysseas Papapetrou, Minos Garofalakis

TL;DR
OmniSketch is a novel data sketching technique that efficiently supports complex multi-attribute aggregate queries with filters on high-velocity data streams, providing probabilistic guarantees and outperforming existing methods.
Contribution
It introduces the first scalable sketch for multi-attribute streams that supports arbitrary filters and aggregates with probabilistic guarantees and efficient updates.
Findings
Outperforms state-of-the-art sketches in accuracy and efficiency.
Supports complex ad-hoc queries with small memory footprint.
Provides probabilistic guarantees with logarithmic update and query complexity.
Abstract
A key need in different disciplines is to perform analytics over fast-paced data streams, similar in nature to the traditional OLAP analytics in relational databases i.e., with filters and aggregates. Storing unbounded streams, however, is not a realistic, or desired approach due to the high storage requirements, and the delays introduced when storing massive data. Accordingly, many synopses/sketches have been proposed that can summarize the stream in small memory (usually sufficiently small to be stored in RAM), such that aggregate queries can be efficiently approximated, without storing the full stream. However, past synopses predominantly focus on summarizing single-attribute streams, and cannot handle filters and constraints on arbitrary subsets of multiple attributes efficiently. In this work, we propose OmniSketch, the first sketch that scales to fast-paced and complex data…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Database Systems and Queries · Data Stream Mining Techniques · Data Management and Algorithms
