Estimation from Partially Sampled Distributed Traces
Otmar Ertl

TL;DR
This paper introduces a scalable, adaptive sampling method for distributed tracing that preserves important events better than traditional approaches and includes an unbiased estimation algorithm for partially sampled traces.
Contribution
It proposes a novel adaptive sampling technique for distributed traces and an unbiased estimation algorithm to improve analysis accuracy with partial data.
Findings
Sampling rates can be set independently for each span.
The estimation algorithm reduces error compared to using only complete traces.
The approach is scalable and adapts to resource constraints.
Abstract
Sampling is often a necessary evil to reduce the processing and storage costs of distributed tracing. In this work, we describe a scalable and adaptive sampling approach that can preserve events of interest better than the widely used head-based sampling approach. Sampling rates can be chosen individually and independently for every span, allowing to take span attributes and local resource constraints into account. The resulting traces are often only partially and not completely sampled which complicates statistical analysis. To exploit the given information, an unbiased estimation algorithm is presented. Even though it does not need to know whether the traces are complete, it reduces the estimation error in many cases compared to considering only complete traces.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware System Performance and Reliability · Advanced Database Systems and Queries · Distributed systems and fault tolerance
