FaaSter Troubleshooting -- Evaluating Distributed Tracing Approaches for Serverless Applications
Maria C. Borges, Sebastian Werner, Ahmet Kilic

TL;DR
This paper evaluates distributed tracing methods to improve fault detection in serverless applications, comparing developer-driven and platform-supported approaches through a model and empirical measurements.
Contribution
It introduces a fault observability model for serverless applications and compares two distributed tracing approaches, providing insights into their trade-offs and effectiveness.
Findings
Platform-supported tracing reduces troubleshooting time.
Developer-driven tracing offers more detailed fault insights.
Trade-offs include increased latency and resource use.
Abstract
Serverless applications can be particularly difficult to troubleshoot, as these applications are often composed of various managed and partly managed services. Faults are often unpredictable and can occur at multiple points, even in simple compositions. Each additional function or service in a serverless composition introduces a new possible fault source and a new layer to obfuscate faults. Currently, serverless platforms offer only limited support for identifying runtime faults. Developers looking to observe their serverless compositions often have to rely on scattered logs and ambiguous error messages to pinpoint root causes. In this paper, we investigate the use of distributed tracing for improving the observability of faults in serverless applications. To this end, we first introduce a model for characterizing fault observability, then provide a prototypical tracing implementation -…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
