MFTune: An Efficient Multi-fidelity Framework for Spark SQL Configuration Tuning
Beicheng Xu, Lingching Tung, Yuchen Wang, Yupeng Lu, Bin Cui

TL;DR
MFTune is a novel multi-fidelity framework that efficiently tunes Spark SQL configurations by using query-based proxies and advanced optimization techniques, significantly reducing tuning time while improving performance.
Contribution
The paper introduces MFTune, a new multi-fidelity tuning framework with query-based fidelity partitioning and density-based search, tailored for Spark SQL configuration optimization.
Findings
Outperforms five state-of-the-art tuning methods on TPC benchmarks.
Identifies superior configurations within practical time constraints.
Reduces tuning time significantly while maintaining high performance.
Abstract
Apache Spark SQL is a cornerstone of modern big data analytics.However,optimizing Spark SQL performance is challenging due to its vast configuration space and the prohibitive cost of evaluating massive workloads. Existing tuning methods predominantly rely on full-fidelity evaluations, which are extremely time-consuming,often leading to suboptimal performance within practical budgets.While multi-fidelity optimization offers a potential solution, directly applying standard techniques-such as data volume reduction or early stopping-proves ineffective for Spark SQL as they fail to preserve performance correlations or represent true system bottlenecks. To address these challenges, we propose MFTune, an efficient multi-fidelity framework that introduces a query-based fidelity partitioning strategy, utilizing representative SQL subsets to provide accurate, low-cost proxies. To navigate the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Database Systems and Queries · Cloud Computing and Resource Management · Data Quality and Management
