LegalCiteBench: Evaluating Citation Reliability in Legal Language Models

Sijia Chen; Hang Yin; Shunfan Zhou

arXiv:2605.10186·cs.CL·May 12, 2026

LegalCiteBench: Evaluating Citation Reliability in Legal Language Models

Sijia Chen, Hang Yin, Shunfan Zhou

PDF

TL;DR

LegalCiteBench is a new benchmark for evaluating legal language models' ability to accurately recover, verify, and match citations in a closed-book setting, revealing significant challenges and high rates of fabricated citations.

Contribution

The paper introduces LegalCiteBench, a comprehensive benchmark with 24K instances for studying citation-related tasks in legal LLMs, highlighting their limitations in authority generation.

Findings

01

Models perform poorly on citation retrieval and completion, scoring below 7/100.

02

High Misleading Answer Rates (MAR) over 94% indicate frequent incorrect citations.

03

Explicit uncertainty instructions reduce confident fabrication but do not improve correctness.

Abstract

Large language models (LLMs) are increasingly integrated into legal drafting and research workflows, where incorrect citations or fabricated precedents can cause serious professional harm. Existing legal benchmarks largely emphasize statutory reasoning, contract understanding, or general legal question answering, but they do not directly study a central common-law failure mode: when asked to provide case authorities without external grounding, models may return plausible-looking but incorrect citations or cases. We introduce LegalCiteBench, a benchmark for studying closed-book citation recovery, citation verification, and case matching in legal language models. LegalCiteBench contains approximately 24K evaluation instances constructed from 1,000 real U.S. judicial opinions from the Case Law Access Project. The benchmark covers five citation-centric tasks: citation retrieval, citation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.