Trace Validation of Unmodified Concurrent Systems with OmniLink
Finn Hackett, Evan Wrench, Peter Macko, A. Jesse Jiryu Davis, Yuanhao Wei, Ivan Beschastnikh

TL;DR
OmniLink is a novel validation methodology for concurrent systems that uses TLA+ and model checking to verify correctness, outperforming existing tools and discovering new bugs in industrial and research systems.
Contribution
The paper introduces OmniLink, a flexible, non-intrusive validation approach that improves linearizability checking and bug detection in concurrent systems using TLA+ and model checking.
Findings
Successfully validated WiredTiger, BAT, and ConcurrentQueue systems.
Discovered two previously unknown bugs in concurrent data structures.
Outperformed existing linearizability checking tools on large-scale validation tasks.
Abstract
Concurrent systems are notoriously difficult to validate: subtle bugs may only manifest under rare thread interleavings, and existing tools often require intrusive instrumentation or unrealistic execution models. We present OmniLink, a new methodology for validating concurrent implementations against high-level specifications in TLA+. Unlike prior TLA+ based approaches which use a technique called trace validation, OmniLink treats system events as black boxes with a timebox in which they occurred and a meaning in TLA+, solving for a logical total order of actions. Unlike prior approaches based on linearizability checking, which already solves for total orders of actions with timeboxes, OmniLink uses a flexible specification language, and offers a different linearizability checking method based on off-the-shelf model checking. OmniLink offers different features compared existing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware System Performance and Reliability · Distributed systems and fault tolerance · Software Testing and Debugging Techniques
