Can Large Language Models Automate the Refinement of Cellular Network Specifications?
Jianshuo Dong, Yuanjie Li, Jun Liu, Hewu Li, Han Qiu

TL;DR
This paper explores using Large Language Models to automate the refinement of cellular network specifications, introducing a benchmark and demonstrating the effectiveness of fine-tuned models in identifying specification weaknesses.
Contribution
It presents CR-Eval, a new benchmark for cellular network specification refinement, and shows that fine-tuned LLMs can outperform larger, general models in this domain.
Findings
GPT-o3-mini detects weaknesses in over 127 test cases.
Fine-tuning an 8B model surpasses larger LLMs like DeepSeek-R1.
Evaluations on real-world attacks demonstrate practical benefits.
Abstract
Cellular networks, e.g., 4G/5G, rely on complex technical specifications to ensure correct functionality; however, these specifications often contain flaws or ambiguities. In this paper, we investigate the application of Large Language Models for automated cellular network specification refinement. We identify Change Requests, which record specification revisions, as a key source of domain-specific data and formulate specification refinement as three complementary sub-tasks. We introduce CR-Eval, a benchmark of 200 security-related test cases, and evaluate 17 open-source and 14 proprietary models. The best-performing model, GPT-o3-mini, identifies weaknesses in over 127 test cases within five trials. We further study LLM specialization, showing that fine-tuning an 8B model can outperform advanced LLMs such as DeepSeek-R1 and Qwen3-235B. Evaluations on 30 real-world cellular attacks…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced MIMO Systems Optimization
