Cache-based Multi-query Optimization for Data-intensive Scalable   Computing Frameworks

Pietro Michiardi; Damiano Carra; Sara Migliorini

arXiv:1805.08650·cs.DB·May 23, 2018

Cache-based Multi-query Optimization for Data-intensive Scalable Computing Frameworks

Pietro Michiardi, Damiano Carra, Sara Migliorini

PDF

TL;DR

This paper presents a cache-based multi-query optimization technique for large-scale distributed systems that reduces redundant processing by sharing common subexpressions among queries, leading to improved efficiency.

Contribution

It introduces a novel method combining in-memory caching with multi-query optimization, formulated as a cost-based problem to enhance data-intensive framework performance.

Findings

01

Significant resource savings on TPC-DS workloads

02

Effective sharing of common subexpressions reduces computation

03

Prototype shows notable performance improvements

Abstract

In modern large-scale distributed systems, analytics jobs submitted by various users often share similar work, for example scanning and processing the same subset of data. Instead of optimizing jobs independently, which may result in redundant and wasteful processing, multi-query optimization techniques can be employed to save a considerable amount of cluster resources. In this work, we introduce a novel method combining in-memory cache primitives and multi-query optimization, to improve the efficiency of data-intensive, scalable computing frameworks. By careful selection and exploitation of common (sub)expressions, while satisfying memory constraints, our method transforms a batch of queries into a new, more efficient one which avoids unnecessary recomputations. To find feasible and efficient execution plans, our method uses a cost-based optimization formulation akin to the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.