A Network Arena for Benchmarking AI Agents on Network Troubleshooting
Zhihao Wang, Alessandro Cornacchia, Alessio Sacco, Franco Galante, Marco Canini, Dingde Jiang

TL;DR
NIKA is a comprehensive, open-source benchmark for evaluating Large Language Model agents in network troubleshooting, enabling standardized assessment across diverse real-world scenarios and issues.
Contribution
This paper introduces NIKA, the largest public benchmark for LLM-based network troubleshooting, with standardized interfaces and diverse scenarios to facilitate research and development.
Findings
Larger models perform better at detecting network issues.
Models still struggle with fault localization and root cause analysis.
NIKA enables consistent evaluation of AI agents in network troubleshooting.
Abstract
Agentic systems, powered by Large Language Models (LLMs), assist network engineers with network configuration synthesis and network troubleshooting tasks. For network troubleshooting, progress is hindered by the absence of standardized and accessible benchmarks for evaluating LLM agents in dynamic network settings at low operational effort. We present NIKA, the largest public benchmark to date for LLM-driven network incident diagnosis and troubleshooting. NIKA targets both domain experts and especially AI researchers alike, providing zero-effort replay of real-world network scenarios, and establishing well-defined agent-network interfaces for quick agent prototyping. NIKA comprises hundreds of curated network incidents, spanning five network scenarios, from data centers to ISP networks, and covers 54 representative network issues. Lastly, NIKA is modular and extensible by design,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware System Performance and Reliability · Software-Defined Networks and 5G · Advanced Graph Neural Networks
