Improving Robustness of Neural Inverse Text Normalization via Data-Augmentation, Semi-Supervised Learning, and Post-Aligning Method
Juntae Kim, Minkyu Lim, and Seokjin Hong

TL;DR
This paper enhances neural inverse text normalization for speech recognition by combining data augmentation, semi-supervised learning, and post-alignment to improve robustness against out-of-domain ASR-generated text.
Contribution
It introduces a novel training approach using augmented data and semi-supervised learning, along with a post-aligning method, to address out-of-domain challenges in ITN.
Findings
Significant performance improvements in ASR scenarios
Effective handling of out-of-domain text issues
Enhanced reliability of inverse text normalization
Abstract
Inverse text normalization (ITN) is crucial for converting spoken-form into written-form, especially in the context of automatic speech recognition (ASR). While most downstream tasks of ASR rely on written-form, ASR systems often output spoken-form, highlighting the necessity for robust ITN in product-level ASR-based applications. Although neural ITN methods have shown promise, they still encounter performance challenges, particularly when dealing with ASR-generated spoken text. These challenges arise from the out-of-domain problem between training data and ASR-generated text. To address this, we propose a direct training approach that utilizes ASR-generated written or spoken text, with pairs augmented through ASR linguistic context emulation and a semi-supervised learning method enhanced by a large language model, respectively. Additionally, we introduce a post-aligning method to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing
