Improving Robustness of Neural Inverse Text Normalization via   Data-Augmentation, Semi-Supervised Learning, and Post-Aligning Method

Juntae Kim; Minkyu Lim; and Seokjin Hong

arXiv:2309.08626·cs.CL·September 19, 2023

Improving Robustness of Neural Inverse Text Normalization via Data-Augmentation, Semi-Supervised Learning, and Post-Aligning Method

Juntae Kim, Minkyu Lim, and Seokjin Hong

PDF

Open Access

TL;DR

This paper enhances neural inverse text normalization for speech recognition by combining data augmentation, semi-supervised learning, and post-alignment to improve robustness against out-of-domain ASR-generated text.

Contribution

It introduces a novel training approach using augmented data and semi-supervised learning, along with a post-aligning method, to address out-of-domain challenges in ITN.

Findings

01

Significant performance improvements in ASR scenarios

02

Effective handling of out-of-domain text issues

03

Enhanced reliability of inverse text normalization

Abstract

Inverse text normalization (ITN) is crucial for converting spoken-form into written-form, especially in the context of automatic speech recognition (ASR). While most downstream tasks of ASR rely on written-form, ASR systems often output spoken-form, highlighting the necessity for robust ITN in product-level ASR-based applications. Although neural ITN methods have shown promise, they still encounter performance challenges, particularly when dealing with ASR-generated spoken text. These challenges arise from the out-of-domain problem between training data and ASR-generated text. To address this, we propose a direct training approach that utilizes ASR-generated written or spoken text, with pairs augmented through ASR linguistic context emulation and a semi-supervised learning method enhanced by a large language model, respectively. Additionally, we introduce a post-aligning method to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing