From KMMLU-Redux to KMMLU-Pro: A Professional Korean Benchmark Suite for LLM Evaluation

Seokhee Hong; Sunkyoung Kim; Guijin Son; Soyeon Kim; Yeonjung Hong; Jinsik Lee

arXiv:2507.08924·cs.CL·July 21, 2025

From KMMLU-Redux to KMMLU-Pro: A Professional Korean Benchmark Suite for LLM Evaluation

Seokhee Hong, Sunkyoung Kim, Guijin Son, Soyeon Kim, Yeonjung Hong, Jinsik Lee

PDF

Open Access 2 Datasets

TL;DR

This paper introduces two expert-level Korean benchmarks, KMMLU-Redux and KMMLU-Pro, designed to evaluate large language models in academic and industrial contexts within Korea, emphasizing real-world applicability.

Contribution

The paper presents new Korean expert-level benchmarks, KMMLU-Redux and KMMLU-Pro, derived from national exams to better assess LLMs in professional and industrial domains.

Findings

01

Benchmarks effectively represent Korean industrial knowledge

02

Models show varying performance on these specialized benchmarks

03

Public dataset release facilitates further research

Abstract

The development of Large Language Models (LLMs) requires robust benchmarks that encompass not only academic domains but also industrial fields to effectively evaluate their applicability in real-world scenarios. In this paper, we introduce two Korean expert-level benchmarks. KMMLU-Redux, reconstructed from the existing KMMLU, consists of questions from the Korean National Technical Qualification exams, with critical errors removed to enhance reliability. KMMLU-Pro is based on Korean National Professional Licensure exams to reflect professional knowledge in Korea. Our experiments demonstrate that these benchmarks comprehensively represent industrial knowledge in Korea. We release our dataset publicly available.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComputational and Text Analysis Methods · Machine Learning in Materials Science · Machine Learning and Data Classification