PSP: An Interpretable Per-Dimension Accent Benchmark for Indic Text-to-Speech

Venkata Pushpak Teja Menta

arXiv:2604.25476·cs.SD·April 29, 2026

PSP: An Interpretable Per-Dimension Accent Benchmark for Indic Text-to-Speech

Venkata Pushpak Teja Menta

PDF

1 Repo 7 Models 1 Datasets

TL;DR

This paper introduces PSP, an interpretable benchmark for evaluating accent features in Indic TTS systems across six dimensions, revealing insights into accent fidelity and system trade-offs.

Contribution

It proposes a novel per-phonological-dimension accent benchmark for Indic TTS, decomposing accent into six interpretable metrics and benchmarking multiple systems.

Findings

01

Retroflex collapse increases with phonological difficulty: Hindi < Telugu < Tamil.

02

PSP ordering differs from WER-based ordering, highlighting accent nuances.

03

No single system excels across all six accent dimensions.

Abstract

Standard text-to-speech (TTS) evaluation measures intelligibility (WER, CER) and overall naturalness (MOS, UTMOS) but does not quantify accent. A synthesiser may score well on all four yet sound non-native on features that are phonemic in the target language. For Indic languages, these features include retroflex articulation, aspiration, vowel length, and the Tamil retroflex approximant (letter zha). We present PSP, the Phoneme Substitution Profile, an interpretable, per-phonological-dimension accent benchmark for Indic TTS. PSP decomposes accent into six complementary dimensions: retroflex collapse rate (RR), aspiration fidelity (AF), vowel-length fidelity (LF), Tamil-zha fidelity (ZF), Frechet Audio Distance (FAD), and prosodic signature divergence (PSD). The first four are measured via forced alignment plus native-speaker-centroid acoustic probes over Wav2Vec2-XLS-R layer-9…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

praxelhq/psp-eval
github

Models

Datasets

Praxel/psp-native-centroids
dataset· 216 dl
216 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.