The GELATO Dataset for Legislative NER

Matthew Flynn; Timothy Obiso; Sam Newman

arXiv:2603.14130·cs.CL·March 17, 2026

The GELATO Dataset for Legislative NER

Matthew Flynn, Timothy Obiso, Sam Newman

PDF

Open Access 1 Datasets

TL;DR

This paper presents GELATO, a new annotated dataset of U.S. legislative texts for NER, and evaluates transformer models and LLMs for multi-level entity recognition, highlighting model strengths and future research directions.

Contribution

Introduction of GELATO, a novel legislative NER dataset with a two-level ontology, and an evaluation of transformer models and LLMs for improved entity recognition.

Findings

01

RoBERTa outperforms BERT in first-level prediction

02

LLMs effectively complete second-level predictions

03

Model combinations show promise for legislative NER tasks

Abstract

This paper introduces GELATO (Government, Executive, Legislative, and Treaty Ontology), a dataset of U.S. House and Senate bills from the 118th Congress annotated using a novel two-level named entity recognition ontology designed for U.S. legislative texts. We fine-tune transformer-based models (BERT, RoBERTa) of different architectures and sizes on this dataset for first-level prediction. We then use LLMs with optimized prompts to complete the second level prediction. The strong performance of RoBERTa and relatively weak performance of BERT models, as well as the application of LLMs as second-level predictors, support future research in legislative NER or downstream tasks using these model combinations as extraction tools.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Wollaston/gelato
dataset· 45 dl
45 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComputational and Text Analysis Methods · Sentiment Analysis and Opinion Mining · Topic Modeling