Execution replay and debugging
Michiel Ronsse, Koen De Bosschere, Jacques Chassin de Kergommeaux

TL;DR
This paper surveys execution replay techniques essential for debugging non-deterministic parallel and distributed programs, highlighting methods to record and replay execution flows efficiently.
Contribution
It provides a comprehensive overview of existing execution replay techniques and tools, emphasizing their approaches to balancing recording detail and performance.
Findings
Various execution replay methods are compared and categorized.
Trade-offs between recording detail and performance are analyzed.
The survey identifies gaps and future directions in execution replay technology.
Abstract
As most parallel and distributed programs are internally non-deterministic -- consecutive runs with the same input might result in a different program flow -- vanilla cyclic debugging techniques as such are useless. In order to use cyclic debugging tools, we need a tool that records information about an execution so that it can be replayed for debugging. Because recording information interferes with the execution, we must limit the amount of information and keep the processing of the information fast. This paper contains a survey of existing execution replay techniques and tools.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Distributed and Parallel Computing Systems · Distributed systems and fault tolerance
