TelcoAgent-Bench: A Multilingual Benchmark for Telecom AI Agents

Lina Bariah; Brahim Mefgouda; Farbod Tavakkoli; Enrique Molero; Louis Powell; Merouane Debbah

arXiv:2604.06209·cs.CL·April 9, 2026

TelcoAgent-Bench: A Multilingual Benchmark for Telecom AI Agents

Lina Bariah, Brahim Mefgouda, Farbod Tavakkoli, Enrique Molero, Louis Powell, Merouane Debbah

PDF

TL;DR

This paper introduces TelcoAgent-Bench, a multilingual benchmarking framework for evaluating telecom-specific LLM agents, focusing on intent recognition, process accuracy, and stability across English and Arabic scenarios.

Contribution

It presents a structured set of metrics and a framework specifically designed to assess the reliability and operational consistency of telecom LLM agents in multilingual environments.

Findings

01

Recent models understand telecom issues reasonably well.

02

Models struggle with consistent troubleshooting and stability.

03

Performance drops in bilingual and unconstrained scenarios.

Abstract

The integration of large language model (LLM) agents into telecom networks introduces new challenges, related to intent recognition, tool execution, and resolution generation, while taking into consideration different operational constraints. In this paper, we introduce TelcoAgent-Bench and TelcoAgent-Metrics, a Telecom-specific benchmarking framework for evaluating multilingual telecom LLM agents. The proposed framework assesses the semantic understanding as well as process-level alignment with structured troubleshooting flows and stability across repeated scenario variations. Our contribution includes a structured suite of metrics that assess intent recognition, ordered tool execution, resolution correctness, and stability across scenario variations, with the aim of quantifying the reliability and operational consistency of LLM agents in telecom environments. The framework is designed…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.