Towards a Playground to Democratize Experimentation and Benchmarking of AI Agents for Network Troubleshooting

Zhihao Wang; Alessandro Cornacchia; Franco Galante; Carlo Centofanti; Alessio Sacco; Dingde Jiang

arXiv:2507.01997·cs.NI·July 8, 2025

Towards a Playground to Democratize Experimentation and Benchmarking of AI Agents for Network Troubleshooting

Zhihao Wang, Alessandro Cornacchia, Franco Galante, Carlo Centofanti, Alessio Sacco, Dingde Jiang

PDF

Open Access

TL;DR

This paper advocates for a standardized benchmarking platform to facilitate the development and evaluation of AI agents in network troubleshooting, aiming to democratize experimentation and improve reproducibility.

Contribution

It introduces the concept of a playground platform dedicated to benchmarking AI agents for network troubleshooting, addressing current gaps in reproducibility and operational ease.

Findings

01

Highlights the importance of standardized benchmarking for AI in network troubleshooting

02

Proposes a framework for an open, reproducible platform for AI agent evaluation

03

Emphasizes low operational effort in building and testing AI agents

Abstract

Recent research has demonstrated the effectiveness of Artificial Intelligence (AI), and more specifically, Large Language Models (LLMs), in supporting network configuration synthesis and automating network diagnosis tasks, among others. In this preliminary work, we restrict our focus to the application of AI agents to network troubleshooting and elaborate on the need for a standardized, reproducible, and open benchmarking platform, where to build and evaluate AI agents with low operational effort.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNetwork Security and Intrusion Detection · Mobile Agent-Based Network Management · Multi-Agent Systems and Negotiation