TL;DR
Praxy Voice introduces a zero-cost method for adapting non-Indic TTS bases to high-quality Indic speech, combining phoneme space mapping, a lightweight LoRA adapter, and a voice-prompt recovery technique.
Contribution
It presents a novel approach that achieves commercial-class Indic TTS output without additional training data or acoustic decoder training.
Findings
Matches or slightly exceeds commercial baselines on key benchmarks
Reduces code-mix WER significantly with a native-script transliteration branch
Effective zero-cost adaptation for Indic TTS from non-Indic bases
Abstract
Commercial TTS systems produce near-native Indic audio, but the best open-source bases (Chatterbox, Indic Parler-TTS, IndicF5) trail them on measured phonological dimensions, and the most widely adopted multilingual base (Chatterbox, 23 languages) does not even tokenise Telugu or Tamil. We ask: what is the minimum intervention that brings such a non-Indic-native base to commercial-class output on Telugu, Tamil, and Hindi, without training a new acoustic decoder and without any commercial TTS training data? We combine three pieces: (1) BUPS, a Brahmic Unified Phoneme Space that deterministically romanises seven Indic scripts to ISO-15919 so Chatterbox's Latin tokeniser can process them; (2) a LoRA adapter on only the text-token predictor (Chatterbox's t3), trained on ~1,220h of licensed Indic audio with a Hindi-proxy language_id; (3) a voice-prompt recovery recipe -- an 8-11s…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
