Synthesizing the Virtual Advocate: A Multi-Persona Speech Generation Framework for Diverse Linguistic Jurisdictions in Indic Languages
Aniket Deroy

TL;DR
This paper evaluates multilingual TTS models for synthetic courtroom speech in Indic languages, proposing a prompting framework to generate advocate personas, and discusses current limitations in emotional expressiveness and phonological diversity.
Contribution
It introduces a prompting framework leveraging Gemini 2.5 models for multi-language legal speech synthesis and analyzes their performance and challenges in diverse linguistic contexts.
Findings
Models perform well in procedural speech delivery.
Struggle with emotional modulation and vocal dynamics.
Performance varies across languages, with Bengali and Gujarati showing lower quality.
Abstract
Legal advocacy requires a unique combination of authoritative tone, rhythmic pausing for emphasis, and emotional intelligence. This study investigates the performance of the Gemini 2.5 Flash TTS and Gemini 2.5 Pro TTS models in generating synthetic courtroom speeches across five Indic languages: Tamil, Telugu, Bengali, Hindi, and Gujarati. We propose a prompting framework that utilizes Gemini 2.5s native support for 5 languages and its context-aware pacing to produce distinct advocate personas. The evolution of Large Language Models (LLMs) has shifted the focus of TexttoSpeech (TTS) technology from basic intelligibility to context-aware, expressive synthesis. In the legal domain, synthetic speech must convey authority and a specific professional persona a task that becomes significantly more complex in the linguistically diverse landscape of India. The models exhibit a "monotone…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · AI in Service Interactions · Multimodal Machine Learning Applications
