Async-fork: Mitigating Query Latency Spikes Incurred by the Fork-based Snapshot Mechanism from the OS Level
Pu Pang, Gang Deng, Kaihao Bai, Quan Chen, Shixuan Sun, Bo Liu, Yu Xu,, Hongbo Yao, Zhengheng Wang, Xiyu Wang, Zheng Liu, Zhuo Song, Yong Yang, Tao, Ma, Minyi Guo

TL;DR
This paper introduces Async-fork, a kernel-level optimization that significantly reduces query latency spikes during snapshot creation in in-memory key-value stores by offloading page table copying to the child process.
Contribution
The paper proposes Async-fork, a novel OS-level technique that offloads page table copying during fork to mitigate latency spikes in IMKVSes, with implementation and evaluation in Redis.
Findings
Async-fork reduces tail latency by up to 99.84%.
It achieves significant latency spike mitigation during snapshots.
Implemented in Linux kernel and deployed in Redis in public clouds.
Abstract
In-memory key-value stores (IMKVSes) serve many online applications because of their efficiency. To support data backup, popular industrial IMKVSes periodically take a point-in-time snapshot of the in-memory data with the system call fork. However, this mechanism can result in latency spikes for queries arriving during the snapshot period because fork leads the engine into the kernel mode in which the engine is out-of-service for queries. In contrast to existing research focusing on optimizing snapshot algorithms, we optimize the fork operation to address the latency spikes problem from the operating system (OS) level, while keeping the data persistent mechanism in IMKVSes unchanged. Specifically, we first conduct an in-depth study to reveal the impact of the fork operation as well as the optimization techniques on query latency. Based on findings in the study, we propose Async-fork to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management · Caching and Content Delivery · Advanced Data Storage Technologies
