Optimizing Korean-Centric LLMs via Token Pruning
Hoyeol Kim, Hyeonwoo Kim

TL;DR
This paper systematically benchmarks multilingual LLMs with token pruning for Korean NLP, showing improved stability, translation performance, and memory efficiency, with architecture-dependent effects on instruction following.
Contribution
It demonstrates that token pruning effectively optimizes multilingual LLMs for Korean tasks, enhancing stability and efficiency while maintaining performance.
Findings
Token pruning improves generation stability by reducing language confusion.
Machine translation performance on Korean tasks is often enhanced by token pruning.
Vocabulary reduction leads to significant memory savings with modest latency gains.
Abstract
This paper presents a systematic benchmark of state-of-the-art multilingual large language models (LLMs) adapted via token pruning - a compression technique that eliminates tokens and embedding parameters corresponding to languages irrelevant to the target application. Focusing on Korean-centric natural language processing (NLP) tasks, we evaluate architectures including Qwen3, Gemma-3, Llama-3, and Aya across three vocabulary configurations: Original, English-Korean (EnKo), and English-Korean-Chinese (EnKoZh). Performance is assessed using established benchmarks for general aptitude, cultural literacy, instruction following, and machine translation. Our findings indicate that token pruning significantly improves generation stability by eliminating language confusion, and in the case of machine translation, frequently enhances performance on Korean-specific tasks. While…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
