Thunder-LLM: Efficiently Adapting LLMs to Korean with Minimal Resources

Jinpyo Kim; Gyeongje Cho; Chanwoo Park; Jongwon Park; Jongmin Kim; Yeonkyoun So; Jaejin Lee

arXiv:2506.21595·cs.CL·June 30, 2025

Thunder-LLM: Efficiently Adapting LLMs to Korean with Minimal Resources

Jinpyo Kim, Gyeongje Cho, Chanwoo Park, Jongwon Park, Jongmin Kim, Yeonkyoun So, Jaejin Lee

PDF

Open Access 1 Models

TL;DR

This paper introduces Thunder-LLM, a cost-effective method for adapting English-based large language models to Korean, demonstrating superior performance with minimal data and resources.

Contribution

It presents a complete end-to-end process for low-resource language adaptation of LLMs, including data collection, training, and evaluation, with publicly available code.

Findings

01

Thunder-LLM models outperform existing Korean LLMs

02

Effective adaptation achieved with minimal data and compute

03

Comprehensive methodology shared for low-resource language adaptation

Abstract

Since state-of-the-art LLMs often underperform in languages other than English or Chinese, improving the capability of LLMs in new languages has become an essential task. Moreover, LLMs' entire end-to-end training process remains largely unknown to the public due to proprietary reasons, technical complexity, inconsistent documentation, and ethical considerations. The complete picture remains a closely guarded secret within the industry. This paper presents methods to adapt an existing English-based LLM to Korean in a low-budget scenario. We describe the entire end-to-end process: collecting Korean datasets, preprocessing the data, training the model, creating downstream benchmarks, and conducting evaluations. The evaluation results indicate that our method can effectively and cost-efficiently add new language capabilities to existing LLMs. Our new bilingual models, Thunder-LLM and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
thunder-research-group/Llama-Thunder-LLM-8B
model· 84 dl· ♡ 8
84 dl♡ 8

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsLibrary Science and Information Systems · Natural Language Processing Techniques · Artificial Intelligence Applications