Open Ko-LLM Leaderboard: Evaluating Large Language Models in Korean with   Ko-H5 Benchmark

Chanjun Park; Hyeonwoo Kim; Dahyun Kim; Seonghwan Cho; Sanghoon Kim,; Sukyung Lee; Yungi Kim; Hwalsuk Lee

arXiv:2405.20574·cs.CL·August 20, 2024·1 cites

Open Ko-LLM Leaderboard: Evaluating Large Language Models in Korean with Ko-H5 Benchmark

Chanjun Park, Hyeonwoo Kim, Dahyun Kim, Seonghwan Cho, Sanghoon Kim,, Sukyung Lee, Yungi Kim, Hwalsuk Lee

PDF

Open Access 1 Video

TL;DR

This paper presents the Open Ko-LLM Leaderboard and Ko-H5 Benchmark for evaluating Korean language models, emphasizing the importance of private test sets and comprehensive evaluation for linguistic diversity.

Contribution

It introduces a new evaluation framework and benchmark for Korean LLMs, including private test sets and analysis methods, to improve model assessment.

Findings

01

Private test sets enhance evaluation robustness

02

Correlation between Ko-H5 scores and model performance

03

Temporal analysis reveals trends in Korean LLM development

Abstract

This paper introduces the Open Ko-LLM Leaderboard and the Ko-H5 Benchmark as vital tools for evaluating Large Language Models (LLMs) in Korean. Incorporating private test sets while mirroring the English Open LLM Leaderboard, we establish a robust evaluation framework that has been well integrated in the Korean LLM community. We perform data leakage analysis that shows the benefit of private test sets along with a correlation study within the Ko-H5 benchmark and temporal analyses of the Ko-H5 score. Moreover, we present empirical support for the need to expand beyond set benchmarks. We hope the Open Ko-LLM Leaderboard sets precedent for expanding LLM evaluation to foster more linguistic diversity.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Open Ko-LLM Leaderboard: Evaluating Large Language Models in Korean with Ko-H5 Benchmark· underline

Taxonomy

TopicsComputational and Text Analysis Methods · Natural Language Processing Techniques · Topic Modeling

MethodsSparse Evolutionary Training