Exploiting Opportunistic Physical Design in Large-scale Data Analytics
Jeff LeFevre, Jagan Sankaranarayanan, Hakan Hacigumus, Junichi, Tatemura, Neoklis Polyzotis, Michael J. Carey

TL;DR
This paper introduces a method to leverage existing materialized views in large-scale data analytics systems for query optimization, significantly improving performance by reducing execution time.
Contribution
It presents a novel query-rewrite algorithm that efficiently utilizes opportunistic views, including those with UDFs, to optimize complex data analysis queries.
Findings
Average 61% reduction in execution time
Up to 100x performance improvement
Effective query optimization with real-world datasets
Abstract
Large-scale systems, such as MapReduce and Hadoop, perform aggressive materialization of intermediate job results in order to support fault tolerance. When jobs correspond to exploratory queries submitted by data analysts, these materializations yield a large set of materialized views that typically capture common computation among successive queries from the same analyst, or even across queries of different analysts who test similar hypotheses. We propose to treat these views as an opportunistic physical design and use them for the purpose of query optimization. We develop a novel query-rewrite algorithm that addresses the two main challenges in this context: how to search the large space of rewrites, and how to reason about views that contain UDFs (a common feature in large-scale data analytics). The algorithm, which provably finds the minimum-cost rewrite, is inspired by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Management and Algorithms · Advanced Database Systems and Queries · Cloud Computing and Resource Management
