Benchmarking LLM-Driven Network Configuration Repair
Ioannis Protogeros, Rufat Asadli, Benjamin Hoffman, Laurent Vanbever

TL;DR
This paper introduces Cornetto, a comprehensive benchmark for assessing the effectiveness and safety of LLMs in repairing large-scale network configurations, highlighting current limitations and guiding future improvements.
Contribution
The paper presents Cornetto, the first benchmark that synthesizes diverse network misconfigurations and evaluates LLMs using formal verification for functional correctness.
Findings
LLMs often introduce regressions in network configuration repairs.
Performance of LLMs degrades as network scale increases.
Integrating LLMs with formal verification improves reliability.
Abstract
There is a rapidly growing interest in using Large Language Models (LLMs) to automate complex network operations, but their reliable adoption requires rigorous assessment of their effectiveness and safety. Existing benchmarks do not address whether LLMs can successfully resolve errors in large-scale, interdependent network configurations without introducing new disruptions. Developing such a benchmark is challenging: scenarios must be diverse and increasingly complex, yet their evaluation must be straightforward and meaningful. In this paper, we present Cornetto, the first benchmark to evaluate LLM-driven network configuration repair functionally and at scale. Cornetto features a generation pipeline that synthesizes representative and plausible misconfiguration scenarios, coupled with an evaluation framework that uses formal verification to assess functional correctness of proposed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
