MFTune: An Efficient Multi-fidelity Framework for Spark SQL Configuration Tuning

Beicheng Xu; Lingching Tung; Yuchen Wang; Yupeng Lu; Bin Cui

arXiv:2603.16450·cs.DB·March 18, 2026

MFTune: An Efficient Multi-fidelity Framework for Spark SQL Configuration Tuning

Beicheng Xu, Lingching Tung, Yuchen Wang, Yupeng Lu, Bin Cui

PDF

Open Access

TL;DR

MFTune is a novel multi-fidelity framework that efficiently tunes Spark SQL configurations by using query-based proxies and advanced optimization techniques, significantly reducing tuning time while improving performance.

Contribution

The paper introduces MFTune, a new multi-fidelity tuning framework with query-based fidelity partitioning and density-based search, tailored for Spark SQL configuration optimization.

Findings

01

Outperforms five state-of-the-art tuning methods on TPC benchmarks.

02

Identifies superior configurations within practical time constraints.

03

Reduces tuning time significantly while maintaining high performance.

Abstract

Apache Spark SQL is a cornerstone of modern big data analytics.However,optimizing Spark SQL performance is challenging due to its vast configuration space and the prohibitive cost of evaluating massive workloads. Existing tuning methods predominantly rely on full-fidelity evaluations, which are extremely time-consuming,often leading to suboptimal performance within practical budgets.While multi-fidelity optimization offers a potential solution, directly applying standard techniques-such as data volume reduction or early stopping-proves ineffective for Spark SQL as they fail to preserve performance correlations or represent true system bottlenecks. To address these challenges, we propose MFTune, an efficient multi-fidelity framework that introduces a query-based fidelity partitioning strategy, utilizing representative SQL subsets to provide accurate, low-cost proxies. To navigate the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Database Systems and Queries · Cloud Computing and Resource Management · Data Quality and Management