Learning Bill Similarity with Annotated and Augmented Corpora of Bills
Jiseon Kim, Elden Griggs, In Song Kim, Alice Oh

TL;DR
This paper introduces a new classification approach and dataset for identifying semantic similarities between legislative bills, enhancing understanding of the legislative process through augmented data and advanced modeling techniques.
Contribution
It presents a novel 5-class classification task, a human-labeled dataset of bill relationships, and synthetic data augmentation to improve similarity detection in legal documents.
Findings
Performance improves with combined human and synthetic data training
Model effectively captures similarities at section and bill levels
Dataset and methodology facilitate better legislative document analysis
Abstract
Bill writing is a critical element of representative democracy. However, it is often overlooked that most legislative bills are derived, or even directly copied, from other bills. Despite the significance of bill-to-bill linkages for understanding the legislative process, existing approaches fail to address semantic similarities across bills, let alone reordering or paraphrasing which are prevalent in legal document writing. In this paper, we overcome these limitations by proposing a 5-class classification task that closely reflects the nature of the bill generation process. In doing so, we construct a human-labeled dataset of 4,721 bill-to-bill relationships at the subsection-level and release this annotated dataset to the research community. To augment the dataset, we generate synthetic data with varying degrees of similarity, mimicking the complex bill writing process. We use BERT…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIndonesian Election Politics and Participation · Legal Language and Interpretation · Artificial Intelligence in Law
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Linear Warmup With Linear Decay · Weight Decay · Attention Dropout · Dropout · Layer Normalization · Softmax · Residual Connection
