Handling Korean Out-of-Vocabulary Words with Phoneme Representation Learning

Nayeon Kim; Eojin Jeon; Jun-Hyung Park; SangKeun Lee

arXiv:2507.04018·cs.CL·July 8, 2025

Handling Korean Out-of-Vocabulary Words with Phoneme Representation Learning

Nayeon Kim, Eojin Jeon, Jun-Hyung Park, SangKeun Lee

PDF

TL;DR

This paper presents KOPL, a phoneme-based framework for effectively handling Korean out-of-vocabulary words, improving NLP task performance by integrating phoneme and word representations.

Contribution

KOPL introduces a novel phoneme representation learning approach tailored for Korean, enhancing OOV word handling and compatibility with existing embedding models.

Findings

01

KOPL outperforms previous models by 1.9% on average.

02

It effectively captures phoneme and text information for Korean OOV words.

03

KOPL is easily integrated into existing Korean NLP models.

Abstract

In this study, we introduce KOPL, a novel framework for handling Korean OOV words with Phoneme representation Learning. Our work is based on the linguistic property of Korean as a phonemic script, the high correlation between phonemes and letters. KOPL incorporates phoneme and word representations for Korean OOV words, facilitating Korean OOV word representations to capture both text and phoneme information of words. We empirically demonstrate that KOPL significantly improves the performance on Korean Natural Language Processing (NLP) tasks, while being readily integrated into existing static and contextual Korean embedding models in a plug-and-play manner. Notably, we show that KOPL outperforms the state-of-the-art model by an average of 1.9%. Our code is available at https://github.com/jej127/KOPL.git.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.