Fast Dimensional Analysis for Root Cause Investigation in a Large-Scale Service Environment
Fred Lin, Keyur Muzumdar, Nikolay Pavlovich Laptev, Mihai-Valentin, Curelea, Seunghak Lee, Sriram Sankar

TL;DR
This paper introduces a fast, scalable dimensional analysis framework using frequent item-set mining techniques to automate root cause analysis in large-scale, complex production environments, improving speed and interpretability.
Contribution
The paper presents a novel scalable framework combining Apriori and FP-Growth algorithms with pre- and post-processing for effective root cause analysis in large-scale logs.
Findings
Successfully applied in large-scale production environments
Improved speed and interpretability of root cause analysis
Demonstrated effectiveness through multiple real-world use cases
Abstract
Root cause analysis in a large-scale production environment is challenging due to the complexity of services running across global data centers. Due to the distributed nature of a large-scale system, the various hardware, software, and tooling logs are often maintained separately, making it difficult to review the logs jointly for understanding production issues. Another challenge in reviewing the logs for identifying issues is the scale - there could easily be millions of entities, each described by hundreds of features. In this paper we present a fast dimensional analysis framework that automates the root cause analysis on structured logs with improved scalability. We first explore item-sets, i.e. combinations of feature values, that could identify groups of samples with sufficient support for the target failures using the Apriori algorithm and a subsequent improvement, FP-Growth.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
