SFMS-ALR: Script-First Multilingual Speech Synthesis with Adaptive Locale Resolution
Dharma Teja Donepudi

TL;DR
SFMS-ALR is a flexible, real-time multilingual speech synthesis framework that segments text by script, identifies language adaptively, and normalizes prosody, enabling natural code-switching without retraining existing TTS models.
Contribution
Introduces SFMS-ALR, a novel engine-agnostic framework for multilingual TTS that handles code-switching through script segmentation and adaptive locale resolution, requiring no retraining.
Findings
Supports seamless integration with existing TTS voices.
Demonstrates improved naturalness and intelligibility in multilingual synthesis.
Provides a modular baseline for multilingual TTS evaluation.
Abstract
Intra-sentence multilingual speech synthesis (code-switching TTS) remains a major challenge due to abrupt language shifts, varied scripts, and mismatched prosody between languages. Conventional TTS systems are typically monolingual and fail to produce natural, intelligible speech in mixed-language contexts. We introduce Script-First Multilingual Synthesis with Adaptive Locale Resolution (SFMS-ALR), an engine-agnostic framework for fluent, real-time code-switched speech generation. SFMS-ALR segments input text by Unicode script, applies adaptive language identification to determine each segment's language and locale, and normalizes prosody using sentiment-aware adjustments to preserve expressive continuity across languages. The algorithm generates a unified SSML representation with appropriate "lang" or "voice" spans and synthesizes the utterance in a single TTS request. Unlike…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
