Minimally Supervised Written-to-Spoken Text Normalization

Ke Wu; Kyle Gorman; and Richard Sproat

arXiv:1609.06649·cs.CL·September 22, 2016

Minimally Supervised Written-to-Spoken Text Normalization

Ke Wu, Kyle Gorman, and Richard Sproat

PDF

Open Access

TL;DR

This paper explores minimally supervised methods for text normalization in speech applications, comparing approaches with varying levels of language-specific knowledge and data availability, and evaluates their effectiveness on English and Russian.

Contribution

It introduces and evaluates a framework for text normalization that reduces reliance on extensive hand-crafted grammars and aligned data, using universal covering grammars and hallucinated data.

Findings

01

Universal covering grammars perform competitively with hand-crafted grammars.

02

Hallucinated data can effectively substitute for aligned corpora in training.

03

Approaches are validated on both English and Russian datasets.

Abstract

In speech-applications such as text-to-speech (TTS) or automatic speech recognition (ASR), \emph{text normalization} refers to the task of converting from a \emph{written} representation into a representation of how the text is to be \emph{spoken}. In all real-world speech applications, the text normalization engine is developed---in large part---by hand. For example, a hand-built grammar may be used to enumerate the possible ways of saying a given token in a given language, and a statistical model used to select the most appropriate pronunciation in context. In this study we examine the tradeoffs associated with using more or less language-specific domain knowledge in a text normalization engine. In the most data-rich scenario, we have access to a carefully constructed hand-built normalization grammar that for any given token will produce a set of all possible verbalizations for that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis