Towards a Playground to Democratize Experimentation and Benchmarking of AI Agents for Network Troubleshooting
Zhihao Wang, Alessandro Cornacchia, Franco Galante, Carlo Centofanti, Alessio Sacco, Dingde Jiang

TL;DR
This paper advocates for a standardized benchmarking platform to facilitate the development and evaluation of AI agents in network troubleshooting, aiming to democratize experimentation and improve reproducibility.
Contribution
It introduces the concept of a playground platform dedicated to benchmarking AI agents for network troubleshooting, addressing current gaps in reproducibility and operational ease.
Findings
Highlights the importance of standardized benchmarking for AI in network troubleshooting
Proposes a framework for an open, reproducible platform for AI agent evaluation
Emphasizes low operational effort in building and testing AI agents
Abstract
Recent research has demonstrated the effectiveness of Artificial Intelligence (AI), and more specifically, Large Language Models (LLMs), in supporting network configuration synthesis and automating network diagnosis tasks, among others. In this preliminary work, we restrict our focus to the application of AI agents to network troubleshooting and elaborate on the need for a standardized, reproducible, and open benchmarking platform, where to build and evaluate AI agents with low operational effort.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNetwork Security and Intrusion Detection · Mobile Agent-Based Network Management · Multi-Agent Systems and Negotiation
