LOCAT: Low-Overhead Online Configuration Auto-Tuning of Spark SQL Applications
Jinhan Xin, Kai Hwang, Zhibin Yu

TL;DR
LOCAT is a Bayesian Optimization-based system that efficiently tunes Spark SQL configurations online, reducing overhead and adapting to different input data sizes, thereby significantly improving application performance.
Contribution
LOCAT introduces three innovative techniques—QCSA, DAGP, and IICP—to enable low-overhead, data size-aware, and focused configuration tuning for Spark SQL applications.
Findings
Accelerates optimization by 4.1x to 9.7x compared to state-of-the-art methods.
Improves Spark SQL application performance by 1.9x to 2.4x.
Works effectively across different clusters and input data sizes.
Abstract
Spark SQL has been widely deployed in industry but it is challenging to tune its performance. Recent studies try to employ machine learning (ML) to solve this problem, but suffer from two drawbacks. First, it takes a long time (high overhead) to collect training samples. Second, the optimal configuration for one input data size of the same application might not be optimal for others. To address these issues, we propose a novel Bayesian Optimization (BO) based approach named LOCAT to automatically tune the configurations of Spark SQL applications online. LOCAT innovates three techniques. The first technique, named QCSA, eliminates the configuration-insensitive queries by Query Configuration Sensitivity Analysis (QCSA) when collecting training samples. The second technique, dubbed DAGP, is a Datasize-Aware Gaussian Process (DAGP) which models the performance of an application as a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMetabolomics and Mass Spectrometry Studies · Cloud Computing and Resource Management · Spectroscopy Techniques in Biomedical and Chemical Research
