Wherefore Art Thou? Provenance-Guided Automatic Online Debugging with Lumos
Jingyuan Chen, Lei Zhang, Leon Schuermann, Gongqi Huang, Ravi Netravali, Amit Levy

TL;DR
Lumos is an online debugging framework for distributed systems that efficiently captures provenance information to help identify root causes of non-deterministic bugs with low overhead.
Contribution
It introduces a dependency-guided instrumentation approach using static analysis to automatically record relevant provenance data at runtime.
Findings
Lumos effectively exposes bug provenance with minimal runtime overhead.
It enables quick root cause analysis from limited bug occurrences.
Lumos outperforms existing tools in provenance collection efficiency.
Abstract
Debugging distributed systems in-production is inevitable and hard. Myriad interactions between concurrent components in modern, complex and large-scale systems cause non-deterministic bugs that offline testing and verification fail to capture. When bugs surface at runtime, their root causes may be far removed from their symptoms. To identify a root cause, developers often need evidence scattered across multiple components and traces. Unfortunately, existing tools fail to quickly and automatically record useful provenance information at low overheads, leaving developers to manually perform the onerous evidence collection task. Lumos is an online debugging framework that exposes application-level bug provenances--the computational history linking symptoms of an incident to their root causes. Lumos leverages dependency-guided instrumentation powered by static analysis to identify program…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
