Lightweight Record-and-Replay for Intermittent Tests Failures
Omar S Navarro Leija, Alan Jeffrey

TL;DR
This paper introduces lightweight record-and-replay (RR), a method to reduce nondeterminism in concurrent programs, thereby decreasing intermittent test failures with minimal performance overhead, demonstrated on the Servo web browser.
Contribution
The paper presents a novel lightweight RR approach that targets thread communication nondeterminism, differing from traditional fully deterministic RR solutions.
Findings
Effective at reducing intermittent failures in some tests
Log sizes remain small, indicating low storage overhead
Performance overhead is still being optimized
Abstract
In this paper we present lightweight record-and-replay (RR). In contrast to traditional "fully deterministic" RR solutions, lightweight RR focuses on handling nondeterminism arising from thread communication for programs with concurrent, message-passing architectures. By decreasing nondeterminism in programs, lightweight RR decreases the number of intermittent failures in program's test suites. We evaluated the effectiveness of lightweight RR on Servo, a highly concurrent web browser. Our evaluation shows lightweight RR is effective at greatly reducing intermittent failures for some tests, but not others. Lightweight RR performance overhead remains a work in progress, but log sizes are quite small. We believe with further work lightweight RR could prove useful for lowering nondeterminism in programs at a negligible performance overhead.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDistributed systems and fault tolerance · Software System Performance and Reliability · Cloud Computing and Resource Management
