Ready Jurist One: Benchmarking Language Agents for Legal Intelligence in Dynamic Environments

Zheng Jia; Shengbin Yue; Wei Chen; Siyuan Wang; Yidong Liu; Zejun Li; Yun Song; Zhongyu Wei

arXiv:2507.04037·cs.AI·January 26, 2026

Ready Jurist One: Benchmarking Language Agents for Legal Intelligence in Dynamic Environments

Zheng Jia, Shengbin Yue, Wei Chen, Siyuan Wang, Yidong Liu, Zejun Li, Yun Song, Zhongyu Wei

PDF

Open Access 1 Datasets

TL;DR

This paper introduces a new interactive legal environment and evaluation framework to assess language models' legal reasoning and procedural skills in dynamic, real-world scenarios, revealing current models' limitations.

Contribution

It presents J1-ENVS and J1-EVAL, pioneering tools for benchmarking legal AI in dynamic environments, highlighting gaps in models' procedural capabilities.

Findings

01

Models perform well in legal knowledge but poorly in procedural tasks.

02

GPT-4o achieves less than 60% overall performance.

03

Dynamic legal intelligence remains a significant challenge.

Abstract

The gap between static benchmarks and the dynamic nature of real-world legal practice poses a key barrier to advancing legal intelligence. To this end, we introduce J1-ENVS, the first interactive and dynamic legal environment tailored for LLM-based agents. Guided by legal experts, it comprises six representative scenarios from Chinese legal practices across three levels of environmental complexity. We further introduce J1-EVAL, a fine-grained evaluation framework, designed to assess both task performance and procedural compliance across varying levels of legal proficiency. Extensive experiments on 17 LLM agents reveal that, while many models demonstrate solid legal knowledge, they struggle with procedural execution in dynamic settings. Even the SOTA model, GPT-4o, falls short of 60% overall performance. These findings highlight persistent challenges in achieving dynamic legal…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

CimoInkPool/J1-Eval_Dataset
dataset· 41 dl
41 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Law · Multi-Agent Systems and Negotiation