Korean Canonical Legal Benchmark: Toward Knowledge-Independent Evaluation of LLMs' Legal Reasoning Capabilities
Hongseok Oh, Wonseok Hwang, Kyoung-Woon On

TL;DR
The paper introduces the Korean Canonical Legal Benchmark (KCL) to evaluate language models' legal reasoning abilities independently of domain knowledge, using question-level precedents and diverse question formats.
Contribution
It presents a novel benchmark with question-level precedents for disentangling reasoning from knowledge, and provides systematic evaluation of 30+ models highlighting current gaps.
Findings
Reasoning-specialized models outperform general models.
Large gaps remain in legal reasoning capabilities.
KCL benchmark resources are publicly released.
Abstract
We introduce the Korean Canonical Legal Benchmark (KCL), a benchmark designed to assess language models' legal reasoning capabilities independently of domain-specific knowledge. KCL provides question-level supporting precedents, enabling a more faithful disentanglement of reasoning ability from parameterized knowledge. KCL consists of two components: (1) KCL-MCQA, multiple-choice problems of 283 questions with 1,103 aligned precedents, and (2) KCL-Essay, open-ended generation problems of 169 questions with 550 aligned precedents and 2,739 instance-level rubrics for automated evaluation. Our systematic evaluation of 30+ models shows large remaining gaps, particularly in KCL-Essay, and that reasoning-specialized models consistently outperform their general-purpose counterparts. We release all resources, including the benchmark dataset and evaluation code, at https://github.com/lbox-kr/kcl.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsArtificial Intelligence in Law · Topic Modeling · Multi-Agent Systems and Negotiation
