AgoraSpeech: A multi-annotated comprehensive dataset of political   discourse through the lens of humans and AI

Pavlos Sermpezis; Stelios Karamanidis; Eva Paraschou; Ilias; Dimitriadis; Sofia Yfantidou; Filitsa-Ioanna Kouskouveli; Thanasis Troboukis,; Kelly Kiki; Antonis Galanopoulos; and Athena Vakali

arXiv:2501.06265·cs.CL·January 14, 2025

AgoraSpeech: A multi-annotated comprehensive dataset of political discourse through the lens of humans and AI

Pavlos Sermpezis, Stelios Karamanidis, Eva Paraschou, Ilias, Dimitriadis, Sofia Yfantidou, Filitsa-Ioanna Kouskouveli, Thanasis Troboukis,, Kelly Kiki, Antonis Galanopoulos, and Athena Vakali

PDF

TL;DR

AgoraSpeech is a high-quality, multi-annotated dataset of Greek political speeches from 2023, designed for NLP tasks and social science research, created through combined AI and human validation.

Contribution

It introduces a comprehensive, multi-task annotated political speech dataset with a novel two-step AI-human validation process, filling a gap in high-quality political discourse corpora.

Findings

01

Dataset enables nuanced political analysis

02

Effective AI-human annotation pipeline demonstrated

03

Useful for benchmarking NLP models

Abstract

Political discourse datasets are important for gaining political insights, analyzing communication strategies or social science phenomena. Although numerous political discourse corpora exist, comprehensive, high-quality, annotated datasets are scarce. This is largely due to the substantial manual effort, multidisciplinarity, and expertise required for the nuanced annotation of rhetorical strategies and ideological contexts. In this paper, we present AgoraSpeech, a meticulously curated, high-quality dataset of 171 political speeches from six parties during the Greek national elections in 2023. The dataset includes annotations (per paragraph) for six natural language processing (NLP) tasks: text classification, topic identification, sentiment analysis, named entity recognition, polarization and populism detection. A two-step annotation was employed, starting with ChatGPT-generated…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.