CAIL2019-SCM: A Dataset of Similar Case Matching in Legal Domain
Chaojun Xiao, Haoxi Zhong, Zhipeng Guo, Cunchao Tu and, Zhiyuan Liu, Maosong Sun, Tianyang Zhang, Xianpei Han, Zhen Hu, and Heng Wang, Jianfeng Xu

TL;DR
This paper introduces CAIL2019-SCM, a large dataset of Chinese legal case triplets for similar case matching, along with baseline models and a competitive benchmark to advance research in legal AI.
Contribution
The paper provides a new, sizable dataset for similar case matching in the legal domain and establishes baseline models and a competitive benchmark for future research.
Findings
The dataset contains 8,964 triplets of legal cases.
The top team achieved a score of 71.88.
Baseline models are provided for comparison.
Abstract
In this paper, we introduce CAIL2019-SCM, Chinese AI and Law 2019 Similar Case Matching dataset. CAIL2019-SCM contains 8,964 triplets of cases published by the Supreme People's Court of China. CAIL2019-SCM focuses on detecting similar cases, and the participants are required to check which two cases are more similar in the triplets. There are 711 teams who participated in this year's competition, and the best team has reached a score of 71.88. We have also implemented several baselines to help researchers better understand this task. The dataset and more details can be found from https://github.com/china-ai-law-challenge/CAIL2019/tree/master/scm.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Law · Computational and Text Analysis Methods · Topic Modeling
