Lifetime-Based Memory Management for Distributed Data Processing Systems
Lu Lu, Xuanhua Shi, Yongluan Zhou, Xiong Zhang, Hai Jin, Cheng Pei,, Ligang He, Yuanzhen Geng

TL;DR
This paper introduces Deca, a lifetime-based memory management framework for distributed data processing systems like Spark, significantly reducing garbage collection overhead and improving performance by analyzing data object lifetimes.
Contribution
It proposes a novel lifetime analysis approach and implements Deca on Spark to optimize memory allocation and release, enhancing scalability and efficiency.
Findings
Reduces garbage collection time by up to 99.9%
Achieves up to 22.7x speedup without data spilling
Consumes up to 46.6% less memory
Abstract
In-memory caching of intermediate data and eager combining of data in shuffle buffers have been shown to be very effective in minimizing the re-computation and I/O cost in distributed data processing systems like Spark and Flink. However, it has also been widely reported that these techniques would create a large amount of long-living data objects in the heap, which may quickly saturate the garbage collector, especially when handling a large dataset, and hence would limit the scalability of the system. To eliminate this problem, we propose a lifetime-based memory management framework, which, by automatically analyzing the user-defined functions and data types, obtains the expected lifetime of the data objects, and then allocates and releases memory space accordingly to minimize the garbage collection overhead. In particular, we present Deca, a concrete implementation of our proposal on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management · Advanced Data Storage Technologies · Parallel Computing and Optimization Techniques
