Clustering-based Automatic Construction of Legal Entity Knowledge Base   from Contracts

Fuqi Song; \'Eric de la Clergerie

arXiv:2012.01942·cs.CL·March 30, 2021

Clustering-based Automatic Construction of Legal Entity Knowledge Base from Contracts

Fuqi Song, \'Eric de la Clergerie

PDF

TL;DR

This paper introduces a clustering-based method to automatically build a legal entity knowledge base from contracts, effectively handling errors from OCR and NER, and achieving high recall on real-world data.

Contribution

It presents a novel approach for rapid, automated construction of legal entity knowledge bases directly from contracts without additional references.

Findings

01

Recalls 84% of legal entities in real contract data

02

Robust to OCR and NER errors and typos

03

Effective on diverse contract qualities

Abstract

In contract analysis and contract automation, a knowledge base (KB) of legal entities is fundamental for performing tasks such as contract verification, contract generation and contract analytic. However, such a KB does not always exist nor can be produced in a short time. In this paper, we propose a clustering-based approach to automatically generate a reliable knowledge base of legal entities from given contracts without any supplemental references. The proposed method is robust to different types of errors brought by pre-processing such as Optical Character Recognition (OCR) and Named Entity Recognition (NER), as well as editing errors such as typos. We evaluate our method on a dataset that consists of 800 real contracts with various qualities from 15 clients. Compared to the collected ground-truth data, our method is able to recall 84\% of the knowledge.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.