BillSum: A Corpus for Automatic Summarization of US Legislation

Anastassia Kornilova; Vlad Eidelman

arXiv:1910.00523·cs.CL·December 5, 2019

BillSum: A Corpus for Automatic Summarization of US Legislation

Anastassia Kornilova, Vlad Eidelman

PDF

2 Repos 1 Models 5 Datasets

TL;DR

This paper introduces BillSum, a novel dataset for automatic summarization of US legislation, and evaluates extractive summarization methods, demonstrating transferability across different legislative domains.

Contribution

It provides the first dataset for legislative summarization and benchmarks methods, highlighting challenges and transferability of models across state and federal bills.

Findings

01

Neural and traditional extractive methods perform effectively on legislative data.

02

Models trained on Congressional bills can be adapted to California bills.

03

BillSum dataset presents unique challenges due to legislative language.

Abstract

Automatic summarization methods have been studied on a variety of domains, including news and scientific articles. Yet, legislation has not previously been considered for this task, despite US Congress and state governments releasing tens of thousands of bills every year. In this paper, we introduce BillSum, the first dataset for summarization of US Congressional and California state bills (https://github.com/FiscalNote/BillSum). We explain the properties of the dataset that make it more challenging to process than other domains. Then, we benchmark extractive methods that consider neural sentence representations and traditional contextual features. Finally, we demonstrate that models built on Congressional bills can be used to summarize California bills, thus, showing that methods developed on this dataset can transfer to states without human-written summaries.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

🤗
andrejmiscic/simcls-scorer-billsum
model· 5 dl
5 dl

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.