Supercharging Agenda Setting Research: The ParlaCAP Dataset of 28 European Parliaments and a Scalable Multilingual LLM-Based Classification

Taja Kuzman Punger\v{s}ek; Peter Rupnik; Daniela \v{S}irini\'c; Nikola Ljube\v{s}i\'c

arXiv:2602.16516·cs.CL·May 1, 2026

Supercharging Agenda Setting Research: The ParlaCAP Dataset of 28 European Parliaments and a Scalable Multilingual LLM-Based Classification

Taja Kuzman Punger\v{s}ek, Peter Rupnik, Daniela \v{S}irini\'c, Nikola Ljube\v{s}i\'c

PDF

1 Models

TL;DR

This paper presents ParlaCAP, a large multilingual dataset of European parliamentary speeches, and a scalable LLM-based method for classifying policy topics, enabling comparative political analysis.

Contribution

It introduces a novel dataset and a cost-effective, scalable classification method using LLMs, improving domain-specific policy topic annotation across multiple languages.

Findings

01

The LLM-based classifier matches human agreement levels.

02

The classifier outperforms existing out-of-domain CAP classifiers.

03

The dataset enables analysis of political attention, sentiment, and gender differences.

Abstract

This paper introduces ParlaCAP, a large-scale dataset for analyzing parliamentary agenda setting across Europe, and proposes a cost-effective method for building domain-specific policy topic classifiers. Applying the Comparative Agendas Project (CAP) schema to the multilingual ParlaMint corpus of over 8 million speeches from 28 parliaments of European countries and autonomous regions, we follow a teacher-student framework in which a high-performing large language model (LLM) annotates in-domain training data and a multilingual encoder model is fine-tuned on these annotations for scalable data annotation. We show that this approach produces a classifier tailored to the target domain. Agreement between the LLM and human annotators is comparable to inter-annotator agreement among humans, and the resulting model outperforms existing CAP classifiers trained on manually-annotated but…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
classla/ParlaCAP-Topic-Classifier
model· 121 dl· ♡ 5
121 dl♡ 5

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.