Elastic Scheduling of Intermittent Query Processing in a Cluster Environment
Saranya Chandrasekaran, S. Sudarshan

TL;DR
This paper introduces elastic scheduling schemes for intermittent query processing in cluster environments, optimizing cost and deadline adherence while handling multiple queries and input variations.
Contribution
It proposes novel elastic scheduling algorithms for batched query processing that adapt node resources dynamically to meet deadlines at minimal cost.
Findings
Scheduling schemes outperform fixed-node approaches in cost and deadline compliance.
Implemented on Apache Spark in AWS EMR, demonstrating practical effectiveness.
Experimental results with TPC-H and Yahoo datasets validate the approach.
Abstract
Many applications process a stream of tuples over a window duration, and require the results within a specified deadline after the end of the window. For such scenarios, processing tuples intermittently (in batches) instead of eagerly processing tuples as they arrive significantly reduces the overall cost. Earlier work on intermittent query processing has addressed only fixed environments. In this paper, we propose scheduling schemes for batched processing of tuples, in an elastic parallel environment, scaling nodes up or down. Our scheduling schemes ensure to meet the deadlines, while incurring minimum cost. Our schemes also handle multiple concurrent queries, the arrival of new queries, and input rate variations. We have implemented our schemes on top of Apache Spark, in the AWS EMR environment, and evaluated performance with both TPC-H and Yahoo Streaming datasets. Our experimental…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
