Evaluating Legal Reasoning Traces with Legal Issue Tree Rubrics

Jinu Lee; Kyoung-Woon On; Simeng Han; Arman Cohan; Julia Hockenmaier

arXiv:2512.01020·cs.AI·May 4, 2026

Evaluating Legal Reasoning Traces with Legal Issue Tree Rubrics

Jinu Lee, Kyoung-Woon On, Simeng Han, Arman Cohan, Julia Hockenmaier

PDF

1 Models 1 Datasets

TL;DR

This paper introduces LEGIT, a large-scale legal reasoning dataset with hierarchical issue trees, to evaluate and improve LLMs' legal reasoning through rubrics, retrieval augmentation, and reinforcement learning.

Contribution

The paper presents a novel dataset and rubric-based evaluation method for legal reasoning traces, demonstrating how RAG and RL enhance LLM legal reasoning capabilities.

Findings

01

LLMs' legal reasoning is heavily influenced by issue coverage and correctness.

02

Retrieval-augmented generation improves overall reasoning ability.

03

Reinforcement learning enhances correctness but reduces issue coverage.

Abstract

Evaluating the quality of LLM-generated reasoning traces in expert domains (e.g., law) is essential for ensuring credibility and explainability, yet remains challenging due to the inherent complexity of such reasoning tasks. We introduce LEGIT (LEGal Issue Trees), a novel large-scale (24K instances) expert-level legal reasoning dataset with an emphasis on reasoning trace evaluation. We convert court judgments into hierarchical trees of opposing parties' arguments and the court's conclusions, which serve as rubrics for evaluating the issue coverage and correctness of the reasoning traces. We verify the reliability of these rubrics via human expert annotations and comparison with coarse, less informative rubrics. Using the LEGIT dataset, we show that (1) LLMs' legal reasoning ability is seriously affected by both legal issue coverage and correctness, and that (2) retrieval-augmented…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
narcolepticchicken/legal-agent-micro-v5
model

Datasets

jinulee-v/legit_ko_verl
dataset· 306 dl
306 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.