Clustering-based Automatic Construction of Legal Entity Knowledge Base from Contracts
Fuqi Song, \'Eric de la Clergerie

TL;DR
This paper introduces a clustering-based method to automatically build a legal entity knowledge base from contracts, effectively handling errors from OCR and NER, and achieving high recall on real-world data.
Contribution
It presents a novel approach for rapid, automated construction of legal entity knowledge bases directly from contracts without additional references.
Findings
Recalls 84% of legal entities in real contract data
Robust to OCR and NER errors and typos
Effective on diverse contract qualities
Abstract
In contract analysis and contract automation, a knowledge base (KB) of legal entities is fundamental for performing tasks such as contract verification, contract generation and contract analytic. However, such a KB does not always exist nor can be produced in a short time. In this paper, we propose a clustering-based approach to automatically generate a reliable knowledge base of legal entities from given contracts without any supplemental references. The proposed method is robust to different types of errors brought by pre-processing such as Optical Character Recognition (OCR) and Named Entity Recognition (NER), as well as editing errors such as typos. We evaluate our method on a dataset that consists of 800 real contracts with various qualities from 15 clients. Compared to the collected ground-truth data, our method is able to recall 84\% of the knowledge.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
