Fast Failure Recovery for Main-Memory DBMSs on Multicores
Yingjun Wu, Wentian Guo, Chee-Yong Chan, Kian-Lee Tan

TL;DR
PACMAN is a novel parallel recovery mechanism for main-memory DBMSs that leverages application semantics and static/dynamic analysis to achieve fast failure recovery without logging overhead, significantly improving recovery times.
Contribution
It introduces PACMAN, a recovery method that uses static and dynamic analysis to enable parallel log recovery in main-memory DBMSs with coarse-grained logging.
Findings
Significantly reduces recovery time compared to existing methods
Maintains high transaction processing efficiency during normal operation
Effectively exploits application semantics for parallel recovery
Abstract
Main-memory database management systems (DBMS) can achieve excellent performance when processing massive volume of on-line transactions on modern multi-core machines. But existing durability schemes, namely, tuple-level and transaction-level logging-and-recovery mechanisms, either degrade the performance of transaction processing or slow down the process of failure recovery. In this paper, we show that, by exploiting application semantics, it is possible to achieve speedy failure recovery without introducing any costly logging overhead to the execution of concurrent transactions. We propose PACMAN, a parallel database recovery mechanism that is specifically designed for lightweight, coarse-grained transaction-level logging. PACMAN leverages a combination of static and dynamic analyses to parallelize the log recovery: at compile time, PACMAN decomposes stored procedures by carefully…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDistributed systems and fault tolerance · Cloud Computing and Resource Management · Software System Performance and Reliability
