A Workload-Specific Memory Capacity Configuration Approach for In-Memory Data Analytic Platforms
Yi Liang, Shilu Chang, Chao Su

TL;DR
This paper introduces WSMC, a workload-specific memory configuration approach for Spark that accurately predicts memory needs, enabling significant memory savings with minimal performance impact.
Contribution
WSMC is a novel method that classifies workloads and predicts memory requirements considering multiple factors, improving memory efficiency for in-memory data analytics.
Findings
WSMC reduces memory usage by over 40% compared to default configurations.
WSMC achieves only 5% performance degradation with optimized memory settings.
Compared to manual tuning, WSMC slightly increases memory waste by 7% but improves workload performance by 1%.
Abstract
We propose WSMC, a workload-specific memory capacity configuration approach for the Spark workloads, which guides users on the memory capacity configuration with the accurate prediction of the workload's memory requirement under various input data size and parameter settings.First, WSMC classifies the in-memory computing workloads into four categories according to the workloads' Data Expansion Ratio. Second, WSMC establishes a memory requirement prediction model with the consideration of the input data size, the shuffle data size, the parallelism of the workloads and the data block size. Finally, for each workload category, WSMC calculates the shuffle data size in the prediction model in a workload-specific way. For the ad-hoc workload, WSMC can profile its Data Expansion Ratio with small-sized input data and decide the category that the workload falls into. Users can then determine the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management · Parallel Computing and Optimization Techniques · Advanced Data Storage Technologies
