Don't cry over spilled records: Memory elasticity of data-parallel applications and its application to cluster scheduling
Calin Iorgulescu, Florin Dinu, Aunn Raza, Wajih Ul Hassan, and Willy, Zwaenepoel

TL;DR
This paper investigates memory elasticity in data-parallel workloads, demonstrating its prevalence and predictability across frameworks, and leverages it to improve cluster scheduling efficiency and reduce job completion times.
Contribution
It introduces the concept of memory elasticity in data-parallel tasks, quantifies its effects, and applies it to enhance cluster scheduling strategies.
Findings
Memory elasticity is common in Hadoop, Spark, Tez, and Flink.
Tasks can run with as little as 10% of ideal memory with moderate slowdown.
Scheduling with memory elasticity can reduce job completion time by up to 60%.
Abstract
Understanding the performance of data-parallel workloads when resource-constrained has significant practical importance but unfortunately has received only limited attention. This paper identifies, quantifies and demonstrates memory elasticity, an intrinsic property of data-parallel tasks. Memory elasticity allows tasks to run with significantly less memory that they would ideally want while only paying a moderate performance penalty. For example, we find that given as little as 10% of ideal memory, PageRank and NutchIndexing Hadoop reducers become only 1.2x/1.75x and 1.08x slower. We show that memory elasticity is prevalent in the Hadoop, Spark, Tez and Flink frameworks. We also show that memory elasticity is predictable in nature by building simple models for Hadoop and extending them to Tez and Spark. To demonstrate the potential benefits of leveraging memory elasticity, this paper…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management · Distributed and Parallel Computing Systems · Parallel Computing and Optimization Techniques
