NetRCA: An Effective Network Fault Cause Localization Algorithm
Chaoli Zhang, Zhiqiang Zhou, Yingying Zhang, Linxiao Yang, Kai He,, Qingsong Wen, Liang Sun

TL;DR
NetRCA is a novel algorithm that effectively localizes network fault causes by extracting features, generating training data from limited labels, and using an ensemble model, demonstrated on real-world data.
Contribution
The paper introduces a comprehensive approach combining feature extraction, data augmentation, and ensemble learning for network fault localization under limited labeled data.
Findings
Outperforms existing methods on real-world dataset
Effectively utilizes unlabeled data through label propagation
Achieves high accuracy in fault cause localization
Abstract
Localizing the root cause of network faults is crucial to network operation and maintenance. However, due to the complicated network architectures and wireless environments, as well as limited labeled data, accurately localizing the true root cause is challenging. In this paper, we propose a novel algorithm named NetRCA to deal with this problem. Firstly, we extract effective derived features from the original raw data by considering temporal, directional, attribution, and interaction characteristics. Secondly, we adopt multivariate time series similarity and label propagation to generate new training data from both labeled and unlabeled data to overcome the lack of labeled samples. Thirdly, we design an ensemble model which combines XGBoost, rule set learning, attribution model, and graph algorithm, to fully utilize all data information and enhance performance. Finally, experiments and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Network Security and Intrusion Detection · Software System Performance and Reliability
