LAiW: A Chinese Legal Large Language Models Benchmark
Yongfu Dai, Duanyu Feng, Jimin Huang, Haochen Jia, Qianqian Xie,, Yifang Zhang, Weiguang Han, Wei Tian, Hao Wang

TL;DR
This paper introduces LAiW, a benchmark for Chinese legal large language models that evaluates their legal reasoning aligned with legal practice, revealing current models' limitations in basic tasks and need for reinforcement of legal logic.
Contribution
The paper presents the first Chinese legal LLMs benchmark based on legal practice logic, with a multi-level evaluation framework for comprehensive assessment.
Findings
LLMs perform poorly on basic legal tasks
Models show potential in complex legal applications
Legal experts confirm the need for improved legal reasoning in LLMs
Abstract
General and legal domain LLMs have demonstrated strong performance in various tasks of LegalAI. However, the current evaluations of these LLMs in LegalAI are defined by the experts of computer science, lacking consistency with the logic of legal practice, making it difficult to judge their practical capabilities. To address this challenge, we are the first to build the Chinese legal LLMs benchmark LAiW, based on the logic of legal practice. To align with the thinking process of legal experts and legal practice (syllogism), we divide the legal capabilities of LLMs from easy to difficult into three levels: basic information retrieval, legal foundation inference, and complex legal application. Each level contains multiple tasks to ensure a comprehensive evaluation. Through automated evaluation of current general and legal domain LLMs on our benchmark, we indicate that these LLMs may not…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Law
MethodsALIGN
