TeleEval-OS: Performance evaluations of large language models for operations scheduling
Yanyan Wang, Yingying Wang, Junli Liang, Yin Xu, Yunlong Liu, Yiming Xu, Zhengwang Jiang, Zhehe Li, Fei Li, Long Zhao, Kuang Xu, Qi Song, and Xiangyang Li

TL;DR
This paper introduces TeleEval-OS, a comprehensive benchmark for evaluating large language models in telecommunications operation scheduling, revealing the potential of open-source LLMs to outperform closed-source models in specific tasks.
Contribution
The paper presents the first dedicated benchmark for LLMs in telecom operation scheduling, including diverse datasets and a hierarchical task evaluation framework.
Findings
Open-source LLMs can outperform closed-source LLMs in certain scenarios.
The benchmark covers 15 datasets across 13 subtasks in telecom operations.
LLMs show varying performance across different complexity levels.
Abstract
The rapid advancement of large language models (LLMs) has significantly propelled progress in artificial intelligence, demonstrating substantial application potential across multiple specialized domains. Telecommunications operation scheduling (OS) is a critical aspect of the telecommunications industry, involving the coordinated management of networks, services, risks, and human resources to optimize production scheduling and ensure unified service control. However, the inherent complexity and domain-specific nature of OS tasks, coupled with the absence of comprehensive evaluation benchmarks, have hindered thorough exploration of LLMs' application potential in this critical field. To address this research gap, we propose the first Telecommunications Operation Scheduling Evaluation Benchmark (TeleEval-OS). Specifically, this benchmark comprises 15 datasets across 13 subtasks,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDistributed and Parallel Computing Systems · Service-Oriented Architecture and Web Services · Software System Performance and Reliability
