Human readable network troubleshooting based on anomaly detection and feature scoring
Jose M. Navarro, Alexis Huet, Dario Rossi

TL;DR
This paper introduces a comprehensive network troubleshooting system that combines unsupervised anomaly detection, feature ranking via attention mechanisms, and expert knowledge integration to improve diagnosis efficiency and accuracy.
Contribution
It presents a novel system integrating multiple state-of-the-art methods and evaluates their combined effectiveness on real-world datasets for network troubleshooting.
Findings
High agreement with expert diagnoses
Simple statistical methods enhance troubleshooting performance
Effective in constrained stream-mode settings
Abstract
Network troubleshooting is still a heavily human-intensive process. To reduce the time spent by human operators in the diagnosis process, we present a system based on (i) unsupervised learning methods for detecting anomalies in the time domain, (ii) an attention mechanism to rank features in the feature space and finally (iii) an expert knowledge module able to seamlessly incorporate previously collected domain-knowledge. In this paper, we thoroughly evaluate the performance of the full system and of its individual building blocks: particularly, we consider (i) 10 anomaly detection algorithms as well as (ii) 10 attention mechanisms, that comprehensively represent the current state of the art in the respective fields. Leveraging a unique collection of expert-labeled datasets worth several months of real router telemetry data, we perform a thorough performance evaluation contrasting…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Network Security and Intrusion Detection · Software System Performance and Reliability
