Open Ko-LLM Leaderboard2: Bridging Foundational and Practical Evaluation for Korean LLMs
Hyeonwoo Kim, Dahyun Kim, Jihoo Kim, Sukyung Lee, Yungi Kim, Chanjun, Park

TL;DR
Open Ko-LLM Leaderboard2 enhances Korean LLM evaluation by replacing benchmarks with real-world aligned tasks and adding native Korean benchmarks, aiming for more meaningful assessments.
Contribution
It introduces a revised leaderboard with new tasks and native benchmarks to better evaluate Korean LLMs' practical and linguistic capabilities.
Findings
New tasks better reflect real-world Korean language use
Native benchmarks capture unique Korean linguistic features
Evaluation correlates more closely with practical impact
Abstract
The Open Ko-LLM Leaderboard has been instrumental in benchmarking Korean Large Language Models (LLMs), yet it has certain limitations. Notably, the disconnect between quantitative improvements on the overly academic leaderboard benchmarks and the qualitative impact of the models should be addressed. Furthermore, the benchmark suite is largely composed of translated versions of their English counterparts, which may not fully capture the intricacies of the Korean language. To address these issues, we propose Open Ko-LLM Leaderboard2, an improved version of the earlier Open Ko-LLM Leaderboard. The original benchmarks are entirely replaced with new tasks that are more closely aligned with real-world capabilities. Additionally, four new native Korean benchmarks are introduced to better reflect the distinct characteristics of the Korean language. Through these refinements, Open Ko-LLM…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsResearch Data Management Practices · Digital Rights Management and Security · Library Science and Information Systems
