The GELATO Dataset for Legislative NER
Matthew Flynn, Timothy Obiso, Sam Newman

TL;DR
This paper presents GELATO, a new annotated dataset of U.S. legislative texts for NER, and evaluates transformer models and LLMs for multi-level entity recognition, highlighting model strengths and future research directions.
Contribution
Introduction of GELATO, a novel legislative NER dataset with a two-level ontology, and an evaluation of transformer models and LLMs for improved entity recognition.
Findings
RoBERTa outperforms BERT in first-level prediction
LLMs effectively complete second-level predictions
Model combinations show promise for legislative NER tasks
Abstract
This paper introduces GELATO (Government, Executive, Legislative, and Treaty Ontology), a dataset of U.S. House and Senate bills from the 118th Congress annotated using a novel two-level named entity recognition ontology designed for U.S. legislative texts. We fine-tune transformer-based models (BERT, RoBERTa) of different architectures and sizes on this dataset for first-level prediction. We then use LLMs with optimized prompts to complete the second level prediction. The strong performance of RoBERTa and relatively weak performance of BERT models, as well as the application of LLMs as second-level predictors, support future research in legislative NER or downstream tasks using these model combinations as extraction tools.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational and Text Analysis Methods · Sentiment Analysis and Opinion Mining · Topic Modeling
