Exploring Lexicon-Free Modeling Units for End-to-End Korean and Korean-English Code-Switching Speech Recognition
Jisung Wang, Jihwan Kim, Sangki Kim, Yeha Lee

TL;DR
This paper investigates lexicon-free modeling units for end-to-end Korean and Korean-English code-switching speech recognition using a hybrid CTC/Attention model, demonstrating that sub-word units based on Korean syllables outperform other units without needing a lexicon or language model.
Contribution
It introduces and evaluates five lexicon-free modeling units for Korean ASR, highlighting the effectiveness of sub-word units based on Korean syllables in end-to-end models.
Findings
Sub-word units based on Korean syllables perform best.
Lexicon-free models can achieve high accuracy without language models.
Universal byte units are also explored for multilingual applicability.
Abstract
As the character-based end-to-end automatic speech recognition (ASR) models evolve, the choice of acoustic modeling units becomes important. Since Korean is a fairly phonetic language and has a unique writing system with its own Korean alphabet, it's worth investigating modeling units for an end-to-end Korean ASR task. In this work, we introduce lexicon-free modeling units in Korean, and explore them using a hybrid CTC/Attention-based encoder-decoder model. Five lexicon-free units are investigated: Syllable-based Korean character (with English character for a code-switching task), Korean Jamo character (with English character), sub-word on syllable-based character (with sub-word in English), sub-word on Jamo character (with sub-words in English), and finally byte unit, which is a universal one across language. Experiments on Zeroth-Korean (51.6 hrs) and Medical Record (2530 hrs) are…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Speech and dialogue systems
