LegalBench: A Collaboratively Built Benchmark for Measuring Legal   Reasoning in Large Language Models

Neel Guha; Julian Nyarko; Daniel E. Ho; Christopher R\'e; Adam; Chilton; Aditya Narayana; Alex Chohlas-Wood; Austin Peters; Brandon Waldon,; Daniel N. Rockmore; Diego Zambrano; Dmitry Talisman; Enam Hoque; Faiz Surani,; Frank Fagan; Galit Sarfaty; Gregory M. Dickinson; Haggai Porat; Jason; Hegland; Jessica Wu; Joe Nudell; Joel Niklaus; John Nay; Jonathan H. Choi,; Kevin Tobia; Margaret Hagan; Megan Ma; Michael Livermore; Nikon Rasumov-Rahe,; Nils Holzenberger; Noam Kolt; Peter Henderson; Sean Rehaag; Sharad Goel,; Shang Gao; Spencer Williams; Sunny Gandhi; Tom Zur; Varun Iyer; and Zehua Li

arXiv:2308.11462·cs.CL·August 23, 2023·28 cites

LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models

Neel Guha, Julian Nyarko, Daniel E. Ho, Christopher R\'e, Adam, Chilton, Aditya Narayana, Alex Chohlas-Wood, Austin Peters, Brandon Waldon,, Daniel N. Rockmore, Diego Zambrano, Dmitry Talisman, Enam Hoque, Faiz Surani,, Frank Fagan, Galit Sarfaty, Gregory M. Dickinson

PDF

Open Access 1 Repo 5 Datasets 1 Video

TL;DR

LegalBench is a collaboratively developed benchmark comprising 162 legal reasoning tasks designed to evaluate and advance the understanding of large language models' capabilities in legal reasoning, fostering interdisciplinary dialogue.

Contribution

This paper introduces LegalBench, a new legal reasoning benchmark built with expert input, linking legal frameworks to LLM evaluation, and providing a platform for comprehensive legal AI research.

Findings

01

LegalBench covers six types of legal reasoning.

02

20 LLMs were empirically evaluated on the benchmark.

03

LegalBench facilitates interdisciplinary discussions on legal AI.

Abstract

The advent of large language models (LLMs) and their adoption by the legal community has given rise to the question: what types of legal reasoning can LLMs perform? To enable greater study of this question, we present LegalBench: a collaboratively constructed legal reasoning benchmark consisting of 162 tasks covering six different types of legal reasoning. LegalBench was built through an interdisciplinary process, in which we collected tasks designed and hand-crafted by legal professionals. Because these subject matter experts took a leading role in construction, tasks either measure legal reasoning capabilities that are practically useful, or measure reasoning skills that lawyers find interesting. To enable cross-disciplinary conversations about LLMs in the law, we additionally show how popular legal frameworks for describing legal reasoning -- which distinguish between its many forms…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hazyresearch/legalbench
noneOfficial

Datasets

Videos

LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models· slideslive

Taxonomy

TopicsArtificial Intelligence in Law · Comparative and International Law Studies · Legal Language and Interpretation