SCRIPT: A Subcharacter Compositional Representation Injection Module for Korean Pre-Trained Language Models

SungHo Kim; Juhyeong Park; Eda Atalay; and SangKeun Lee

arXiv:2604.12377·cs.CL·April 15, 2026

SCRIPT: A Subcharacter Compositional Representation Injection Module for Korean Pre-Trained Language Models

SungHo Kim, Juhyeong Park, Eda Atalay, and SangKeun Lee

PDF

1 Repo

TL;DR

SCRIPT is a module that injects subcharacter compositional knowledge into Korean language models, improving their understanding of morphological and phonological structures without changing architecture.

Contribution

It introduces a model-agnostic module that enhances Korean PLMs with subcharacter structural information, leading to better linguistic and task performance.

Findings

01

Enhances Korean PLMs across NLU and NLG tasks.

02

Reshapes embedding space to better capture grammatical regularities.

03

Achieves performance gains without architectural changes.

Abstract

Korean is a morphologically rich language with a featural writing system in which each character is systematically composed of subcharacter units known as Jamo. These subcharacters not only determine the visual structure of Korean but also encode frequent and linguistically meaningful morphophonological processes. However, most current Korean language models (LMs) are based on subword tokenization schemes, which are not explicitly designed to capture the internal compositional structure of characters. To address this limitation, we propose SCRIPT, a model-agnostic module that injects subcharacter compositional knowledge into Korean PLMs. SCRIPT allows to enhance subword embeddings with structural granularity, without requiring architectural changes or additional pre-training. As a result, SCRIPT enhances all baselines across various Korean natural language understanding (NLU) and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

SungHo3268/SCRIPT
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.