PolyNorm: Few-Shot LLM-Based Text Normalization for Text-to-Speech

Michel Wong; Ali Alshehri; Sophia Kao; Haotian He

arXiv:2511.03080·cs.CL·November 6, 2025

PolyNorm: Few-Shot LLM-Based Text Normalization for Text-to-Speech

Michel Wong, Ali Alshehri, Sophia Kao, Haotian He

PDF

Open Access 1 Video

TL;DR

PolyNorm introduces a prompt-based, multilingual text normalization method using LLMs that reduces manual effort and improves accuracy across diverse languages for TTS systems.

Contribution

It presents a novel LLM-based, language-agnostic approach to text normalization with an automatic data pipeline, enabling scalable, low-resource language coverage.

Findings

01

Consistent WER reductions across eight languages

02

Effective in low-resource language settings

03

Provides a multilingual benchmark dataset

Abstract

Text Normalization (TN) is a key preprocessing step in Text-to-Speech (TTS) systems, converting written forms into their canonical spoken equivalents. Traditional TN systems can exhibit high accuracy, but involve substantial engineering effort, are difficult to scale, and pose challenges to language coverage, particularly in low-resource settings. We propose PolyNorm, a prompt-based approach to TN using Large Language Models (LLMs), aiming to reduce the reliance on manually crafted rules and enable broader linguistic applicability with minimal human intervention. Additionally, we present a language-agnostic pipeline for automatic data curation and evaluation, designed to facilitate scalable experimentation across diverse languages. Experiments across eight languages show consistent reductions in the word error rate (WER) compared to a production-grade-based system. To support further…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

PolyNorm: Few-Shot LLM-Based Text Normalization for Text-to-Speech· underline

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Topic Modeling