Non-Standard Vietnamese Word Detection and Normalization for   Text-to-Speech

Huu-Tien Dang; Thi-Hai-Yen Vuong; Xuan-Hieu Phan

arXiv:2209.02971·cs.CL·September 8, 2022·1 cites

Non-Standard Vietnamese Word Detection and Normalization for Text-to-Speech

Huu-Tien Dang, Thi-Hai-Yen Vuong, Xuan-Hieu Phan

PDF

Open Access

TL;DR

This paper presents a two-phase approach for Vietnamese non-standard word detection and normalization in TTS systems, utilizing advanced models and rule-based algorithms to improve accuracy and handle diverse NSW types.

Contribution

It introduces a novel combination of model-based tagging and rule-based normalization specifically tailored for Vietnamese NSWs in TTS applications.

Findings

01

BiLSTM-CNN-CRF and BERT-BiGRU-CRF models achieve over 90% F1 scores.

02

The approach reduces sentence error rates to below 8%.

03

BERT-BiGRU-CRF yields the highest F1 score of 95%.

Abstract

Converting written texts into their spoken forms is an essential problem in any text-to-speech (TTS) systems. However, building an effective text normalization solution for a real-world TTS system face two main challenges: (1) the semantic ambiguity of non-standard words (NSWs), e.g., numbers, dates, ranges, scores, abbreviations, and (2) transforming NSWs into pronounceable syllables, such as URL, email address, hashtag, and contact name. In this paper, we propose a new two-phase normalization approach to deal with these challenges. First, a model-based tagger is designed to detect NSWs. Then, depending on NSW types, a rule-based normalizer expands those NSWs into their final verbal forms. We conducted three empirical experiments for NSW detection using Conditional Random Fields (CRFs), BiLSTM-CNN-CRF, and BERT-BiGRU-CRF models on a manually annotated dataset including 5819 sentences…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis

MethodsConditional Random Field