TelcoAgent-Bench: A Multilingual Benchmark for Telecom AI Agents
Lina Bariah, Brahim Mefgouda, Farbod Tavakkoli, Enrique Molero, Louis Powell, Merouane Debbah

TL;DR
This paper introduces TelcoAgent-Bench, a multilingual benchmarking framework for evaluating telecom-specific LLM agents, focusing on intent recognition, process accuracy, and stability across English and Arabic scenarios.
Contribution
It presents a structured set of metrics and a framework specifically designed to assess the reliability and operational consistency of telecom LLM agents in multilingual environments.
Findings
Recent models understand telecom issues reasonably well.
Models struggle with consistent troubleshooting and stability.
Performance drops in bilingual and unconstrained scenarios.
Abstract
The integration of large language model (LLM) agents into telecom networks introduces new challenges, related to intent recognition, tool execution, and resolution generation, while taking into consideration different operational constraints. In this paper, we introduce TelcoAgent-Bench and TelcoAgent-Metrics, a Telecom-specific benchmarking framework for evaluating multilingual telecom LLM agents. The proposed framework assesses the semantic understanding as well as process-level alignment with structured troubleshooting flows and stability across repeated scenario variations. Our contribution includes a structured suite of metrics that assess intent recognition, ordered tool execution, resolution correctness, and stability across scenario variations, with the aim of quantifying the reliability and operational consistency of LLM agents in telecom environments. The framework is designed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
