CAIL2019-SCM: A Dataset of Similar Case Matching in Legal Domain

Chaojun Xiao; Haoxi Zhong; Zhipeng Guo; Cunchao Tu and; Zhiyuan Liu; Maosong Sun; Tianyang Zhang; Xianpei Han; Zhen Hu; and Heng Wang; Jianfeng Xu

arXiv:1911.08962·cs.CL·November 26, 2019·35 cites

CAIL2019-SCM: A Dataset of Similar Case Matching in Legal Domain

Chaojun Xiao, Haoxi Zhong, Zhipeng Guo, Cunchao Tu and, Zhiyuan Liu, Maosong Sun, Tianyang Zhang, Xianpei Han, Zhen Hu, and Heng Wang, Jianfeng Xu

PDF

Open Access 2 Repos

TL;DR

This paper introduces CAIL2019-SCM, a large dataset of Chinese legal case triplets for similar case matching, along with baseline models and a competitive benchmark to advance research in legal AI.

Contribution

The paper provides a new, sizable dataset for similar case matching in the legal domain and establishes baseline models and a competitive benchmark for future research.

Findings

01

The dataset contains 8,964 triplets of legal cases.

02

The top team achieved a score of 71.88.

03

Baseline models are provided for comparison.

Abstract

In this paper, we introduce CAIL2019-SCM, Chinese AI and Law 2019 Similar Case Matching dataset. CAIL2019-SCM contains 8,964 triplets of cases published by the Supreme People's Court of China. CAIL2019-SCM focuses on detecting similar cases, and the participants are required to check which two cases are more similar in the triplets. There are 711 teams who participated in this year's competition, and the best team has reached a score of 71.88. We have also implemented several baselines to help researchers better understand this task. The dataset and more details can be found from https://github.com/china-ai-law-challenge/CAIL2019/tree/master/scm.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Law · Computational and Text Analysis Methods · Topic Modeling