Toward Low-Latency End-to-End Voice Agents for Telecommunications Using Streaming ASR, Quantized LLMs, and Real-Time TTS

Vignesh Ethiraj; Ashwath David; Sidhanth Menon; Divya Vijay

arXiv:2508.04721·cs.SD·August 8, 2025

Toward Low-Latency End-to-End Voice Agents for Telecommunications Using Streaming ASR, Quantized LLMs, and Real-Time TTS

Vignesh Ethiraj, Ashwath David, Sidhanth Menon, Divya Vijay

PDF

TL;DR

This paper presents a low-latency, end-to-end voice AI pipeline for telecommunications, integrating specialized models for speech recognition, language understanding, retrieval, and speech synthesis to enable real-time, domain-specific voice agents.

Contribution

The paper introduces a novel telecom-specific AI voice agent pipeline with four specialized models, achieving low-latency, real-time performance for telecom applications.

Findings

01

Models achieve real-time factors below 1.0

02

Pipeline supports low-latency, domain-specific interactions

03

System outperforms existing benchmarks in telecom voice AI

Abstract

We introduce a low-latency telecom AI voice agent pipeline for real-time, interactive telecommunications use, enabling advanced voice AI for call center automation, intelligent IVR (Interactive Voice Response), and AI-driven customer support. The solution is built for telecom, combining four specialized models by NetoAI: TSLAM, a 4-bit quantized Telecom-Specific Large Language Model (LLM); T-VEC, a Telecom-Specific Embedding Model; TTE, a Telecom-Specific Automatic Speech Recognition (ASR) model; and T-Synth, a Telecom-Specific Text-to-Speech (TTS) model. These models enable highly responsive, domain-adapted voice AI agents supporting knowledge-grounded spoken interactions with low latency. The pipeline integrates streaming ASR (TTE), conversational intelligence (TSLAM), retrieval augmented generation (RAG) over telecom documents, and real-time TTS (T-Synth), setting a new benchmark for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.