ClaimGen-CN: A Large-scale Chinese Dataset for Legal Claim Generation

Siying Zhou; Yiquan Wu; Hui Chen; Xavier Hu; Kun Kuang; Adam Jatowt; Ming Hu; Chunyan Zheng; Fei Wu

arXiv:2508.17234·cs.CL·October 28, 2025

ClaimGen-CN: A Large-scale Chinese Dataset for Legal Claim Generation

Siying Zhou, Yiquan Wu, Hui Chen, Xavier Hu, Kun Kuang, Adam Jatowt, Ming Hu, Chunyan Zheng, Fei Wu

PDF

1 Video

TL;DR

This paper introduces ClaimGen-CN, a large-scale Chinese dataset for legal claim generation, evaluates current models' performance, and highlights the need for specialized development to improve factual accuracy and clarity.

Contribution

The paper creates the first Chinese legal claim generation dataset, designs a tailored evaluation metric, and provides a comprehensive zero-shot assessment of existing language models.

Findings

01

Current models struggle with factual accuracy.

02

Models lack expressive clarity in generated claims.

03

The dataset will be publicly available for future research.

Abstract

Legal claims refer to the plaintiff's demands in a case and are essential to guiding judicial reasoning and case resolution. While many works have focused on improving the efficiency of legal professionals, the research on helping non-professionals (e.g., plaintiffs) remains unexplored. This paper explores the problem of legal claim generation based on the given case's facts. First, we construct ClaimGen-CN, the first dataset for Chinese legal claim generation task, from various real-world legal disputes. Additionally, we design an evaluation metric tailored for assessing the generated claims, which encompasses two essential dimensions: factuality and clarity. Building on this, we conduct a comprehensive zero-shot evaluation of state-of-the-art general and legal-domain large language models. Our findings highlight the limitations of the current models in factual precision and expressive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

ClaimGen-CN: A Large-scale Chinese Dataset for Legal Claim Generation· underline