Open Ko-LLM Leaderboard2: Bridging Foundational and Practical Evaluation   for Korean LLMs

Hyeonwoo Kim; Dahyun Kim; Jihoo Kim; Sukyung Lee; Yungi Kim; Chanjun; Park

arXiv:2410.12445·cs.CL·March 5, 2025

Open Ko-LLM Leaderboard2: Bridging Foundational and Practical Evaluation for Korean LLMs

Hyeonwoo Kim, Dahyun Kim, Jihoo Kim, Sukyung Lee, Yungi Kim, Chanjun, Park

PDF

Open Access 1 Video

TL;DR

Open Ko-LLM Leaderboard2 enhances Korean LLM evaluation by replacing benchmarks with real-world aligned tasks and adding native Korean benchmarks, aiming for more meaningful assessments.

Contribution

It introduces a revised leaderboard with new tasks and native benchmarks to better evaluate Korean LLMs' practical and linguistic capabilities.

Findings

01

New tasks better reflect real-world Korean language use

02

Native benchmarks capture unique Korean linguistic features

03

Evaluation correlates more closely with practical impact

Abstract

The Open Ko-LLM Leaderboard has been instrumental in benchmarking Korean Large Language Models (LLMs), yet it has certain limitations. Notably, the disconnect between quantitative improvements on the overly academic leaderboard benchmarks and the qualitative impact of the models should be addressed. Furthermore, the benchmark suite is largely composed of translated versions of their English counterparts, which may not fully capture the intricacies of the Korean language. To address these issues, we propose Open Ko-LLM Leaderboard2, an improved version of the earlier Open Ko-LLM Leaderboard. The original benchmarks are entirely replaced with new tasks that are more closely aligned with real-world capabilities. Additionally, four new native Korean benchmarks are introduced to better reflect the distinct characteristics of the Korean language. Through these refinements, Open Ko-LLM…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Open Ko-LLM Leaderboard2: Bridging Foundational and Practical Evaluation for Korean LLMs· underline

Taxonomy

TopicsResearch Data Management Practices · Digital Rights Management and Security · Library Science and Information Systems