A Little Less Interaction, A Little More Action: A Modular Framework for Network Troubleshooting
Istv\'an Pelle, Felici\'an N\'emeth, Andr\'as Guly\'as

TL;DR
This paper presents a modular framework for network troubleshooting that formalizes and automates complex troubleshooting processes using executable graphs, aiming to bridge the gap between ideal automation and current manual practices.
Contribution
It introduces Troubleshooting Graphs (TSGs), a formalized, executable framework for network troubleshooting that enables automation, evaluation, and sharing of troubleshooting workflows.
Findings
Framework formalizes troubleshooting as executable graphs
Enables automation of complex troubleshooting processes
Facilitates sharing of troubleshooting knowledge
Abstract
An ideal network troubleshooting system would be an almost fully automated system, monitoring the whole network at once, feeding the results to a knowledge-based decision making system that suggests actions to the operator or corrects the failure automatically. Reality is quite the contrary: operators separated in their offices try to track down complex networking failures in their own way, which is generally a long sequence of manually edited parallel shell commands (mostly ping, traceroute, route, iperf, ofctl etc.). This process requires operators to be "masters of complexity" (which they often are) and continuous interaction. In this paper we aim at narrowing this huge gap between vision and reality by introducing a modular framework capable of (i) formalizing troubleshooting processes as the concatenation of executable functions [called Troubleshooting Graphs (TSGs)], (ii)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware-Defined Networks and 5G · Network Security and Intrusion Detection · Software System Performance and Reliability
