Whiz: A Fast and Flexible Data Analytics System
Robert Grandl, Arjun Singhvi, Raajay Viswanathan, Aditya Akella

TL;DR
Whiz is a novel data analytics framework that separates computation from data, enabling flexible, data-driven processing and achieving 1.3-2x performance improvements over existing systems.
Contribution
It introduces a flexible, data-centric analytics system with programmable monitoring and event-driven computation, improving performance and adaptability.
Findings
Performance improved by 1.3-2x over state-of-the-art systems
Supports batch, streaming, and graph analytics workloads
Enables runtime visibility and data-driven computation
Abstract
Today's data analytics frameworks are compute-centric, with analytics execution almost entirely dependent on the pre-determined physical structure of the high-level computation. Relegating intermediate data to a second class entity in this manner hurts flexibility, performance, and efficiency. We present Whiz, a new analytics framework that cleanly separates computation from intermediate data. It enables runtime visibility into data via programmable monitoring, and data-driven computation (where intermediate data values drive when/what computation runs) via an event abstraction. Experiments with a Whiz prototype on a large cluster using batch, streaming, and graph analytics workloads show that its performance is 1.3-2x better than state-of-the-art.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management · Graph Theory and Algorithms · Scientific Computing and Data Management
