Exploring Lexicon-Free Modeling Units for End-to-End Korean and   Korean-English Code-Switching Speech Recognition

Jisung Wang; Jihwan Kim; Sangki Kim; Yeha Lee

arXiv:1910.11590·cs.SD·October 28, 2019

Exploring Lexicon-Free Modeling Units for End-to-End Korean and Korean-English Code-Switching Speech Recognition

Jisung Wang, Jihwan Kim, Sangki Kim, Yeha Lee

PDF

Open Access

TL;DR

This paper investigates lexicon-free modeling units for end-to-end Korean and Korean-English code-switching speech recognition using a hybrid CTC/Attention model, demonstrating that sub-word units based on Korean syllables outperform other units without needing a lexicon or language model.

Contribution

It introduces and evaluates five lexicon-free modeling units for Korean ASR, highlighting the effectiveness of sub-word units based on Korean syllables in end-to-end models.

Findings

01

Sub-word units based on Korean syllables perform best.

02

Lexicon-free models can achieve high accuracy without language models.

03

Universal byte units are also explored for multilingual applicability.

Abstract

As the character-based end-to-end automatic speech recognition (ASR) models evolve, the choice of acoustic modeling units becomes important. Since Korean is a fairly phonetic language and has a unique writing system with its own Korean alphabet, it's worth investigating modeling units for an end-to-end Korean ASR task. In this work, we introduce lexicon-free modeling units in Korean, and explore them using a hybrid CTC/Attention-based encoder-decoder model. Five lexicon-free units are investigated: Syllable-based Korean character (with English character for a code-switching task), Korean Jamo character (with English character), sub-word on syllable-based character (with sub-word in English), sub-word on Jamo character (with sub-words in English), and finally byte unit, which is a universal one across language. Experiments on Zeroth-Korean (51.6 hrs) and Medical Record (2530 hrs) are…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Speech and dialogue systems