Analysis of Server Throughput For Managed Big Data Analytics Frameworks
Emmanouil Anagnostakis, Polyvios Pratikakis

TL;DR
This paper analyzes how reducing garbage collection and serialization/deserialization overhead in big data frameworks like Spark and Giraph can improve server throughput by optimizing memory management and CPU utilization.
Contribution
It introduces TeraHeap, a system that moves objects to fast storage to reduce GC and S/D overhead, and evaluates its impact on server throughput under various memory configurations.
Findings
Reducing GC and S/D overhead improves CPU utilization.
TeraHeap enhances server throughput in memory-bound scenarios.
Memory distribution strategies affect performance outcomes.
Abstract
Managed big data frameworks, such as Apache Spark and Giraph demand a large amount of memory per core to process massive volume datasets effectively. The memory pressure that arises from the big data processing leads to high garbage collection (GC) overhead. Big data analytics frameworks attempt to remove this overhead by offloading objects to storage devices. At the same time, infrastructure providers, trying to address the same problem, attribute more memory to increase memory per instance leaving cores underutilized. For frameworks, trying to avoid GC through offloading to storage devices leads to high Serialization/Deserialization (S/D) overhead. For infrastructure, the result is that resource usage is decreased. These limitations prevent managed big data frameworks from effectively utilizing the CPU thus leading to low server throughput. We conduct a methodological analysis of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management · Advanced Data Storage Technologies · Big Data and Digital Economy
