KOMBO: Korean Character Representations Based on the Combination Rules of Subcharacters

SungHo Kim; Juhyeong Park; Yeachan Kim; SangKeun Lee

arXiv:2604.23948·cs.CL·April 28, 2026

KOMBO: Korean Character Representations Based on the Combination Rules of Subcharacters

SungHo Kim, Juhyeong Park, Yeachan Kim, SangKeun Lee

PDF

1 Repo 1 Video

TL;DR

KOMBO introduces a novel Korean language model that leverages the unique principles of Hangeul character formation, outperforming existing models in NLP tasks by focusing on subcharacter representations.

Contribution

The paper presents a new framework for Korean PLMs that encodes Hangeul's invention principles, emphasizing subcharacter features over subword units.

Findings

01

Outperforms state-of-the-art Korean PLMs by 2.11% on five NLP tasks.

02

Demonstrates the effectiveness of subcharacter-based representations for Korean.

03

Shows improved understanding of Korean linguistic features.

Abstract

The Korean writing system, \textit{Hangeul}, has a unique character representation rigidly following the invention principles recorded in \textit{Hunminjeongeum}.\footnote{\textit{Hunminjeongeum} is a book published in 1446 that describes the principles of invention and usage of \textit{Hangeul}, devised by King Sejong \cite{Hunminjeongeum_Guide}.} However, existing pre-trained language models (PLMs) for Korean have overlooked these principles. In this paper, we introduce a novel framework for Korean PLMs called KOMBO, which firstly brings the invention principles of \textit{Hangeul} to represent character. Our proposed method, KOMBO, exhibits notable experimental proficiency across diverse NLP tasks. In particular, our method outperforms the state-of-the-art Korean PLM by an average of 2.11\% in five Korean natural language understanding tasks. Furthermore, extensive experiments…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

SungHo3268/KOMBO
github

Videos

KOMBO: Korean Character Representations Based on the Combination Rules of Subcharacters· underline