Adaptive filter ordering in Spark
Nikodimos Nikolaidis, Anastasios Gounaris

TL;DR
This paper introduces an adaptive method for reordering predicates in Spark data streams, improving execution efficiency by dynamically adjusting to changing data statistics, with an open-source prototype demonstrating its feasibility.
Contribution
It presents a novel engineering approach to make Spark's execution engine adaptive through predicate reordering based on evolving data statistics.
Findings
System extension is available as open-source.
Experimental results show manageable overhead.
Sensitivity to tuning parameters is observed.
Abstract
This report describes a technical methodology to render the Apache Spark execution engine adaptive. It presents the engineering solutions, which specifically target to adaptively reorder predicates in data streams with evolving statistics. The system extension developed is available as an open-source prototype. Indicative experimental results show its overhead and sensitivity to tuning parameters.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Database Systems and Queries · Data Stream Mining Techniques · Time Series Analysis and Forecasting
