Engineering Record And Replay For Deployability: Extended Technical Report
Robert O'Callahan, Chris Jones, Nathan Froyd, Kyle Huey, Albert Noll,, Nimrod Partush

TL;DR
This paper presents 'rr', a practical, low-overhead record-and-replay system that operates entirely in user space on modern hardware, enabling effective debugging and forensic analysis without heavy modifications.
Contribution
It demonstrates that a user-space record-and-replay system can be built avoiding kernel modifications and pervasive instrumentation, relying on specific hardware and OS constraints.
Findings
'rr' records and replays low-parallelism workloads with low overhead
Operates entirely in user space using stock hardware and software
Identifies hardware and OS constraints necessary for deployment
Abstract
The ability to record and replay program executions with low overhead enables many applications, such as reverse-execution debugging, debugging of hard-to-reproduce test failures, and "black box" forensic analysis of failures in deployed systems. Existing record-and-replay approaches limit deployability by recording an entire virtual machine (heavyweight), modifying the OS kernel (adding deployment and maintenance costs), requiring pervasive code instrumentation (imposing significant performance and complexity overhead), or modifying compilers and runtime systems (limiting generality). We investigated whether it is possible to build a practical record-and-replay system avoiding all these issues. The answer turns out to be yes - if the CPU and operating system meet certain non-obvious constraints. Fortunately modern Intel CPUs, Linux kernels and user-space frameworks do meet these…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Space Satellite Systems and Control · Real-Time Systems Scheduling
