TL;DR
CALRK-Bench is a new benchmark designed to evaluate the ability of models to perform context-aware legal reasoning in Korean law, focusing on norm validity, information sufficiency, and judgment shifts.
Contribution
It introduces CALRK-Bench, a novel dataset and evaluation framework for assessing models' understanding of dynamic legal contexts in Korean law.
Findings
Large language models perform poorly on context-aware legal reasoning tasks.
CALRK-Bench reveals the limitations of current models in understanding legal norm shifts.
The benchmark provides a stress test for evaluating legal reasoning beyond memorization.
Abstract
Legal reasoning requires not only the application of legal rules but also an understanding of the context in which those rules operate. However, existing legal benchmarks primarily evaluate rule application under the assumption of fixed norms, and thus fail to capture situations where legal judgments shift or where multiple norms interact. In this work, we propose CALRK-Bench, a context-aware legal reasoning benchmark based on the legal system in Korean. CALRK-Bench evaluates whether models can identify the temporal validity of legal norms, determine whether sufficient legal information is available for a given case, and understand the reasons behind shifts in legal judgments. The dataset is constructed from legal precedents and legal consultation records, and is validated by legal experts. Experimental results show that even recent large language models consistently exhibit low…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
