TeleCom-Bench: How Far Are Large Language Models from Industrial Telecommunication Applications?

Jieting Xiao; Yun Lin; Huizhen Qiu; Rui Ma; Chen Zhong; Dongyang Xu; Xiao Long; Chaoyu Zhang; Qiaobo Hao; Ding Zou; Zhiguo Yang; Yanqin Gao; Fang Tan

arXiv:2605.18025·cs.AI·May 19, 2026

TeleCom-Bench: How Far Are Large Language Models from Industrial Telecommunication Applications?

Jieting Xiao, Yun Lin, Huizhen Qiu, Rui Ma, Chen Zhong, Dongyang Xu, Xiao Long, Chaoyu Zhang, Qiaobo Hao, Ding Zou, Zhiguo Yang, Yanqin Gao, Fang Tan

PDF

1 Repo

TL;DR

TeleCom-Bench introduces a comprehensive benchmark to evaluate large language models in telecommunications, highlighting their strengths in understanding telecom knowledge but exposing significant gaps in procedural application tasks.

Contribution

The paper presents a new benchmark with 12 evaluation sets for assessing LLMs in telecom, including knowledge comprehension and end-to-end application tasks, and provides insights into current model limitations.

Findings

01

Models achieve 90% accuracy in linguistic tasks.

02

Performance drops to ~30% in procedural tasks.

03

Current LLMs are effective diagnosticians but not field engineers.

Abstract

While Large Language Models have achieved remarkable integration in various vertical scenarios, their deployment in the telecommunications domain remains exploratory due to the lack of a standardized evaluation framework. Current telecom benchmarks primarily focus on static, foundational knowledge and isolated atomic skills, neglecting the equipment-specific documentation and end-to-end industrial workflows essential for real-world production systems. To bridge this gap, we present TeleCom-Bench, a comprehensive benchmark comprising 12 evaluation sets with 22,678 curated samples, which evaluates LLMs across a synergistic hierarchy: (1) Multi-dimensional Knowledge Comprehension, which integrates telecommunication fundamentals, 3GPP protocols, and 5G network architecture with proprietary product knowledge across wired, core, and wireless networks via knowledge graph-driven synthesis; and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ZTE-AICloud/TeleCom-Bench
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.