SWE-Bench 5G: Benchmarking AI Coding Agents on Telecom Network Engineering Tasks
Jiao Chen, Jianhua Tang, Xiaotong Yang, and Zuohong Lv

TL;DR
This paper introduces SWE-Bench 5G, a benchmark for evaluating AI coding agents on real-world 5G network bug fixing, highlighting the importance of domain knowledge and iterative editing.
Contribution
It presents the first specialized benchmark for AI agents in 5G telecom network bug resolution, including a dual test strategy and domain knowledge evaluation.
Findings
AI models diagnose bugs over 91% of the time
Bug resolution rates are between 10% and 30%
Domain knowledge improves resolution for specification-dependent bugs
Abstract
AI coding agents demonstrate strong performance on general-purpose software benchmarks. However, their ability to handle 5G network engineering tasks remains unexplored. We propose SWE-Bench~5G, the first benchmark designed to investigate whether AI coding agents can resolve real-world bugs in 5G core network software. The benchmark collects task instances from three open-source 5G projects, packages each as a self-contained Docker environment with automated fail-to-pass tests, and provides a dual test strategy tailored to the complex runtime dependencies of telecom code. In addition, for instances whose original issues reference 3GPP specification clauses, we construct concise specification context documents, enabling controlled evaluation of whether domain knowledge improves agent performance. Experiments on four LLMs reveal that all models diagnose bugs at rates exceeding 91\%, yet…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
