Memory at Your Service: Fast Memory Allocation for Latency-critical Services
Aidi Pi, Junxian Zhao, Shaoqi Wang, Xiaobo Zhou

TL;DR
This paper introduces Hermes, a user-space memory allocator that significantly reduces latency and tail latency for critical services under memory pressure by enabling proactive memory reclamation and fast allocation.
Contribution
Hermes is a novel user-space memory allocation mechanism that adaptively reserves memory and proactively reclaims it, improving latency-critical service performance in multi-tenant datacenters.
Findings
Hermes reduces memory allocation latency by up to 54.4%.
Hermes decreases tail query latency by up to 40.3%.
Hermes lowers SLO violation rates by up to 84.3%.
Abstract
Co-location and memory sharing between latency-critical services, such as key-value store and web search, and best-effort batch jobs is an appealing approach to improving memory utilization in multi-tenant datacenter systems. However, we find that the very diverse goals of job co-location and the GNU/Linux system stack can lead to severe performance degradation of latency-critical services under memory pressure in a multi-tenant system. We address memory pressure for latency-critical services via fast memory allocation and proactive reclamation. We find that memory allocation latency dominates the overall query latency, especially under memory pressure. We analyze the default memory management mechanism provided by GNU/Linux system stack and identify the reasons why it is inefficient for latency-critical services in a multi-tenant system. We present Hermes, a fast memory allocation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management · Parallel Computing and Optimization Techniques · Distributed and Parallel Computing Systems
