Korean Canonical Legal Benchmark: Toward Knowledge-Independent Evaluation of LLMs' Legal Reasoning Capabilities

Hongseok Oh; Wonseok Hwang; Kyoung-Woon On

arXiv:2512.24572·cs.CL·January 6, 2026

Korean Canonical Legal Benchmark: Toward Knowledge-Independent Evaluation of LLMs' Legal Reasoning Capabilities

Hongseok Oh, Wonseok Hwang, Kyoung-Woon On

PDF

Open Access 1 Datasets 1 Video

TL;DR

The paper introduces the Korean Canonical Legal Benchmark (KCL) to evaluate language models' legal reasoning abilities independently of domain knowledge, using question-level precedents and diverse question formats.

Contribution

It presents a novel benchmark with question-level precedents for disentangling reasoning from knowledge, and provides systematic evaluation of 30+ models highlighting current gaps.

Findings

01

Reasoning-specialized models outperform general models.

02

Large gaps remain in legal reasoning capabilities.

03

KCL benchmark resources are publicly released.

Abstract

We introduce the Korean Canonical Legal Benchmark (KCL), a benchmark designed to assess language models' legal reasoning capabilities independently of domain-specific knowledge. KCL provides question-level supporting precedents, enabling a more faithful disentanglement of reasoning ability from parameterized knowledge. KCL consists of two components: (1) KCL-MCQA, multiple-choice problems of 283 questions with 1,103 aligned precedents, and (2) KCL-Essay, open-ended generation problems of 169 questions with 550 aligned precedents and 2,739 instance-level rubrics for automated evaluation. Our systematic evaluation of 30+ models shows large remaining gaps, particularly in KCL-Essay, and that reasoning-specialized models consistently outperform their general-purpose counterparts. We release all resources, including the benchmark dataset and evaluation code, at https://github.com/lbox-kr/kcl.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

lbox/kcl
dataset· 137 dl
137 dl

Videos

Korean Canonical Legal Benchmark: Toward Knowledge-Independent Evaluation of LLMs' Legal Reasoning Capabilities· underline

Taxonomy

TopicsArtificial Intelligence in Law · Topic Modeling · Multi-Agent Systems and Negotiation