Enhancing Confidence Estimation in Telco LLMs via Twin-Pass CoT-Ensembling

Anton Saenko; Pranshav Gajjar; Abiodun Ganiyu; Vijay K. Shah

arXiv:2604.13271·cs.LG·April 16, 2026

Enhancing Confidence Estimation in Telco LLMs via Twin-Pass CoT-Ensembling

Anton Saenko, Pranshav Gajjar, Abiodun Ganiyu, Vijay K. Shah

PDF

TL;DR

This paper introduces a Twin-Pass CoT-Ensembling method to significantly improve confidence calibration in telecom-specific LLMs, making their self-assessment more reliable.

Contribution

It proposes a novel ensemble approach that leverages multiple reasoning passes to produce better calibrated confidence scores for telecom-domain LLMs.

Findings

01

Reduces Expected Calibration Error (ECE) by up to 88% across benchmarks.

02

Standard confidence estimates often overestimate correctness in telecom LLMs.

03

Twin-Pass CoT-Ensembling improves trustworthiness of LLM self-assessment.

Abstract

Large Language Models (LLMs) are increasingly applied to complex telecommunications tasks, including 3GPP specification analysis and O-RAN network troubleshooting. However, a critical limitation remains: LLM-generated confidence scores are often biased and unreliable, frequently exhibiting systematic overconfidence. This lack of trustworthy self-assessment makes it difficult to verify model outputs and safely rely on them in practice. In this paper, we study confidence calibration in telecom-domain LLMs using the representative Gemma-3 model family (4B, 12B, and 27B parameters), evaluated on TeleQnA, ORANBench, and srsRANBench. We show that standard single-pass, verbalized confidence estimates fail to reflect true correctness, often assigning high confidence to incorrect predictions. To address this, we propose a novel Twin-Pass Chain of Thought (CoT)-Ensembling methodology for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.