Beyond Unified Models: A Service-Oriented Approach to Low Latency, Context Aware Phonemization for Real Time TTS
Mahta Fetrat, Donya Navabi, Zahra Dehghanian, Morteza Abolghasemi, Hamid R. Rabiee

TL;DR
This paper presents a service-oriented TTS architecture that decouples complex phonemization modules from the core system, enabling real-time, high-quality, context-aware speech synthesis suitable for end-user devices.
Contribution
It introduces a novel framework that separates lightweight and heavy phonemization components into independent services, balancing quality and inference speed in real-time TTS.
Findings
Improved pronunciation soundness and linguistic accuracy.
Achieved real-time performance with high-quality phonemization.
Effective decoupling of modules enhances system scalability.
Abstract
Lightweight, real-time text-to-speech systems are crucial for accessibility. However, the most efficient TTS models often rely on lightweight phonemizers that struggle with context-dependent challenges. In contrast, more advanced phonemizers with a deeper linguistic understanding typically incur high computational costs, which prevents real-time performance. This paper examines the trade-off between phonemization quality and inference speed in G2P-aided TTS systems, introducing a practical framework to bridge this gap. We propose lightweight strategies for context-aware phonemization and a service-oriented TTS architecture that executes these modules as independent services. This design decouples heavy context-aware components from the core TTS engine, effectively breaking the latency barrier and enabling real-time use of high-quality phonemization models. Experimental results confirm…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsSpeech Recognition and Synthesis · ICT in Developing Communities · Speech and dialogue systems
