AgoraSpeech: A multi-annotated comprehensive dataset of political discourse through the lens of humans and AI
Pavlos Sermpezis, Stelios Karamanidis, Eva Paraschou, Ilias, Dimitriadis, Sofia Yfantidou, Filitsa-Ioanna Kouskouveli, Thanasis Troboukis,, Kelly Kiki, Antonis Galanopoulos, and Athena Vakali

TL;DR
AgoraSpeech is a high-quality, multi-annotated dataset of Greek political speeches from 2023, designed for NLP tasks and social science research, created through combined AI and human validation.
Contribution
It introduces a comprehensive, multi-task annotated political speech dataset with a novel two-step AI-human validation process, filling a gap in high-quality political discourse corpora.
Findings
Dataset enables nuanced political analysis
Effective AI-human annotation pipeline demonstrated
Useful for benchmarking NLP models
Abstract
Political discourse datasets are important for gaining political insights, analyzing communication strategies or social science phenomena. Although numerous political discourse corpora exist, comprehensive, high-quality, annotated datasets are scarce. This is largely due to the substantial manual effort, multidisciplinarity, and expertise required for the nuanced annotation of rhetorical strategies and ideological contexts. In this paper, we present AgoraSpeech, a meticulously curated, high-quality dataset of 171 political speeches from six parties during the Greek national elections in 2023. The dataset includes annotations (per paragraph) for six natural language processing (NLP) tasks: text classification, topic identification, sentiment analysis, named entity recognition, polarization and populism detection. A two-step annotation was employed, starting with ChatGPT-generated…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
