LAiW: A Chinese Legal Large Language Models Benchmark

Yongfu Dai; Duanyu Feng; Jimin Huang; Haochen Jia; Qianqian Xie,; Yifang Zhang; Weiguang Han; Wei Tian; Hao Wang

arXiv:2310.05620·cs.CL·February 20, 2024·5 cites

LAiW: A Chinese Legal Large Language Models Benchmark

Yongfu Dai, Duanyu Feng, Jimin Huang, Haochen Jia, Qianqian Xie,, Yifang Zhang, Weiguang Han, Wei Tian, Hao Wang

PDF

Open Access 1 Repo

TL;DR

This paper introduces LAiW, a benchmark for Chinese legal large language models that evaluates their legal reasoning aligned with legal practice, revealing current models' limitations in basic tasks and need for reinforcement of legal logic.

Contribution

The paper presents the first Chinese legal LLMs benchmark based on legal practice logic, with a multi-level evaluation framework for comprehensive assessment.

Findings

01

LLMs perform poorly on basic legal tasks

02

Models show potential in complex legal applications

03

Legal experts confirm the need for improved legal reasoning in LLMs

Abstract

General and legal domain LLMs have demonstrated strong performance in various tasks of LegalAI. However, the current evaluations of these LLMs in LegalAI are defined by the experts of computer science, lacking consistency with the logic of legal practice, making it difficult to judge their practical capabilities. To address this challenge, we are the first to build the Chinese legal LLMs benchmark LAiW, based on the logic of legal practice. To align with the thinking process of legal experts and legal practice (syllogism), we divide the legal capabilities of LLMs from easy to difficult into three levels: basic information retrieval, legal foundation inference, and complex legal application. Each level contains multiple tasks to ensure a comprehensive evaluation. Through automated evaluation of current general and legal domain LLMs on our benchmark, we indicate that these LLMs may not…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

dai-shen/laiw
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Law

MethodsALIGN