Simplify Your Law: Using Information Theory to Deduplicate Legal   Documents

Corinna Coupette; Jyotsna Singh; Holger Spamann

arXiv:2110.00735·cs.CL·May 10, 2022

Simplify Your Law: Using Information Theory to Deduplicate Legal Documents

Corinna Coupette, Jyotsna Singh, Holger Spamann

PDF

Open Access

TL;DR

This paper introduces Dupex, an information-theoretic algorithm inspired by software refactoring, to detect and eliminate duplicated phrases in legal texts, thereby improving their clarity and maintainability.

Contribution

It presents a novel application of the Minimum Description Length principle to legal document deduplication, adapting software refactoring techniques to legal text simplification.

Findings

01

Dupex effectively identifies duplicated phrases in legal texts.

02

The algorithm improves legal text clarity by reducing redundancy.

03

Experiments on US Code titles demonstrate practical utility.

Abstract

Textual redundancy is one of the main challenges to ensuring that legal texts remain comprehensible and maintainable. Drawing inspiration from the refactoring literature in software engineering, which has developed methods to expose and eliminate duplicated code, we introduce the duplicated phrase detection problem for legal texts and propose the Dupex algorithm to solve it. Leveraging the Minimum Description Length principle from information theory, Dupex identifies a set of duplicated phrases, called patterns, that together best compress a given input text. Through an extensive set of experiments on the Titles of the United States Code, we confirm that our algorithm works well in practice: Dupex will help you simplify your law.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Digital and Cyber Forensics · Web Application Security Vulnerabilities