Reliable LLM-Based Edge-Cloud-Expert Cascades for Telecom Knowledge Systems

Qiushuo Hou; Sangwoo Park; Matteo Zecchin; Yunlong Cai; Guanding Yu; Osvaldo Simeone; and Tommaso Melodia

arXiv:2512.20012·eess.SP·May 12, 2026

Reliable LLM-Based Edge-Cloud-Expert Cascades for Telecom Knowledge Systems

Qiushuo Hou, Sangwoo Park, Matteo Zecchin, Yunlong Cai, Guanding Yu, Osvaldo Simeone, and Tommaso Melodia

PDF

TL;DR

This paper presents a cost-effective, reliable edge-cloud cascade system using LLMs for telecom knowledge tasks, optimizing decision-making with statistical guarantees on misalignment risk.

Contribution

It introduces a novel threshold selection method based on multiple hypothesis testing for LLM cascades, ensuring reliability and cost-efficiency in telecom applications.

Findings

01

The proposed method reduces processing costs compared to traditional cascades.

02

It guarantees finite-sample misalignment risk bounds.

03

Experiments on TeleQnA show improved cost-efficiency and reliability.

Abstract

Large language models (LLMs) are emerging as key enablers of automation in domains such as telecommunications, assisting with tasks including troubleshooting, standards interpretation, and network optimization. However, their deployment in practice must balance inference cost, latency, and reliability. In this work, we study an edge-cloud-expert cascaded LLM-based knowledge system that supports decision-making through a question-and-answer pipeline. In it, an efficient edge model handles routine queries, a more capable cloud model addresses complex cases, and human experts are involved only when necessary. We define a misalignment-cost constrained optimization problem, aiming to minimize average processing cost, while guaranteeing alignment of automated answers with expert judgments. We propose a statistically rigorous threshold selection method based on multiple hypothesis testing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.