Do LLMs Truly Understand When a Precedent Is Overruled?
Li Zhang, Jaromir Savelka, Kevin Ashley

TL;DR
This paper evaluates how well large language models understand overruling relationships in legal cases, revealing limitations like era sensitivity, shallow reasoning, and context-dependent failures, and introduces a new benchmark for realistic legal reasoning assessment.
Contribution
It presents a novel long-context legal reasoning benchmark focused on overruling relationships, highlighting key limitations of current LLMs in complex legal understanding.
Findings
Models perform worse on historical cases, indicating temporal bias.
Models rely on shallow heuristics rather than deep legal reasoning.
Models fail in complex, context-dependent legal reasoning tasks.
Abstract
Large language models (LLMs) with extended context windows show promise for complex legal reasoning tasks, yet their ability to understand long legal documents remains insufficiently evaluated. Developing long-context benchmarks that capture realistic, high-stakes tasks remains a significant challenge in the field, as most existing evaluations rely on simplified synthetic tasks that fail to represent the complexity of real-world document understanding. Overruling relationships are foundational to common-law doctrine and commonly found in judicial opinions. They provide a focused and important testbed for long-document legal understanding that closely resembles what legal professionals actually do. We present an assessment of state-of-the-art LLMs on identifying overruling relationships from U.S. Supreme Court cases using a dataset of 236 case pairs. Our evaluation reveals three critical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
