Adaptive filter ordering in Spark

Nikodimos Nikolaidis; Anastasios Gounaris

arXiv:1905.01349·cs.DB·May 7, 2019·1 cites

Adaptive filter ordering in Spark

Nikodimos Nikolaidis, Anastasios Gounaris

PDF

Open Access

TL;DR

This paper introduces an adaptive method for reordering predicates in Spark data streams, improving execution efficiency by dynamically adjusting to changing data statistics, with an open-source prototype demonstrating its feasibility.

Contribution

It presents a novel engineering approach to make Spark's execution engine adaptive through predicate reordering based on evolving data statistics.

Findings

01

System extension is available as open-source.

02

Experimental results show manageable overhead.

03

Sensitivity to tuning parameters is observed.

Abstract

This report describes a technical methodology to render the Apache Spark execution engine adaptive. It presents the engineering solutions, which specifically target to adaptively reorder predicates in data streams with evolving statistics. The system extension developed is available as an open-source prototype. Indicative experimental results show its overhead and sensitivity to tuning parameters.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Database Systems and Queries · Data Stream Mining Techniques · Time Series Analysis and Forecasting