Contract Discovery: Dataset and a Few-Shot Semantic Retrieval Challenge with Competitive Baselines
{\L}ukasz Borchmann, Dawid Wi\'sniewski, Andrzej Gretkowski and, Izabela Kosmala, Dawid Jurkiewicz, {\L}ukasz Sza{\l}kiewicz and, Gabriela Pa{\l}ka, Karol Kaczmarek, Agnieszka Kaliska, Filip, Grali\'nski

TL;DR
This paper introduces a new legal text retrieval task called contract discovery, providing a dataset and benchmarks, revealing that domain-specific language models with unsupervised fine-tuning outperform general pretrained encoders.
Contribution
The paper presents a novel dataset and challenge for legal contract discovery, along with baseline results and analysis of model performance, highlighting the effectiveness of domain-specific language models.
Findings
Language Model-based solutions outperform pretrained encoders.
Unsupervised fine-tuning improves retrieval accuracy.
Legal domain-specific LMs are publicly available.
Abstract
We propose a new shared task of semantic retrieval from legal texts, in which a so-called contract discovery is to be performed, where legal clauses are extracted from documents, given a few examples of similar clauses from other legal acts. The task differs substantially from conventional NLI and shared tasks on legal information extraction (e.g., one has to identify text span instead of a single document, page, or paragraph). The specification of the proposed task is followed by an evaluation of multiple solutions within the unified framework proposed for this branch of methods. It is shown that state-of-the-art pretrained encoders fail to provide satisfactory results on the task proposed. In contrast, Language Model-based solutions perform better, especially when unsupervised fine-tuning is applied. Besides the ablation studies, we addressed questions regarding detection accuracy for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Methodsk-Nearest Neighbors
