Korean Tokenization for Beam Search Rescoring in Speech Recognition

Kyuhong Shim; Hyewon Bae; Wonyong Sung

arXiv:2203.03583·cs.CL·March 29, 2022

Korean Tokenization for Beam Search Rescoring in Speech Recognition

Kyuhong Shim, Hyewon Bae, Wonyong Sung

PDF

Open Access

TL;DR

This paper introduces a novel Korean tokenization method with a special SkipTC token to improve language model performance in speech recognition, demonstrating lower error rates on a large-scale dataset.

Contribution

A new Korean tokenization approach with SkipTC token that enhances LM learning and improves ASR accuracy, especially on large-scale datasets.

Findings

01

Lower word error rate with proposed tokenization

02

Effective pattern regularization for language models

03

First ASR results on 7,600h Korean dataset

Abstract

The performance of automatic speech recognition (ASR) models can be greatly improved by proper beam-search decoding with external language model (LM). There has been an increasing interest in Korean speech recognition, but not many studies have been focused on the decoding procedure. In this paper, we propose a Korean tokenization method for neural network-based LM used for Korean ASR. Although the common approach is to use the same tokenization method for external LM as the ASR model, we show that it may not be the best choice for Korean. We propose a new tokenization method that inserts a special token, SkipTC, when there is no trailing consonant in a Korean syllable. By utilizing the proposed SkipTC token, the input sequence for LM becomes very regularly patterned so that the LM can better learn the linguistic characteristics. Our experiments show that the proposed approach achieves…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Speech and Audio Processing