LEXam: Benchmarking Legal Reasoning on 340 Law Exams

Yu Fan; Jingwei Ni; Jakob Merane; Yang Tian; Yoan Hermstr\"uwer; Yinya Huang; Mubashara Akhtar; Etienne Salimbeni; Florian Geering; Oliver Dreyer; Daniel Brunner; Markus Leippold; Mrinmaya Sachan; Alexander Stremitzer; Christoph Engel; Elliott Ash; Joel Niklaus

arXiv:2505.12864·cs.CL·April 3, 2026

LEXam: Benchmarking Legal Reasoning on 340 Law Exams

Yu Fan, Jingwei Ni, Jakob Merane, Yang Tian, Yoan Hermstr\"uwer, Yinya Huang, Mubashara Akhtar, Etienne Salimbeni, Florian Geering, Oliver Dreyer, Daniel Brunner, Markus Leippold, Mrinmaya Sachan, Alexander Stremitzer, Christoph Engel, Elliott Ash, Joel Niklaus

PDF

2 Repos 2 Datasets 1 Video

TL;DR

LEXam is a comprehensive benchmark dataset of 340 law exams designed to evaluate and differentiate large language models' legal reasoning abilities through diverse question types and expert-validated evaluation methods.

Contribution

The paper introduces LEXam, a new benchmark dataset for legal reasoning, including detailed reasoning guidance and a scalable evaluation framework for LLMs.

Findings

01

Current LLMs struggle with open-ended legal reasoning questions.

02

The dataset effectively differentiates models with varying capabilities.

03

Ensemble LLMs evaluated as judges show promising alignment with human assessments.

Abstract

Long-form legal reasoning remains a key challenge for large language models (LLMs) in spite of recent advances in test-time scaling. To address this, we introduce LEXam, a novel benchmark derived from 340 law exams spanning 116 law school courses across a range of subjects and degree levels. The dataset comprises 7,537 law exam questions in English and German. It includes both long-form, open-ended questions and multiple-choice questions with varying numbers of options. Besides reference answers, the open questions are also accompanied by explicit guidance outlining the expected legal reasoning approach such as issue spotting, rule recall, or rule application. Our evaluation on both open-ended and multiple-choice questions present significant challenges for current LLMs; in particular, they notably struggle with open questions that require structured, multi-step legal reasoning.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Datasets

Videos

LEXam: Benchmarking Legal Reasoning on 340 Law Exams· slideslive