A Greek Parliament Proceedings Dataset for Computational Linguistics and   Political Analysis

Konstantina Dritsa; Kaiti Thoma; John Pavlopoulos; Panos Louridas

arXiv:2210.12883·cs.CL·October 25, 2022·1 cites

A Greek Parliament Proceedings Dataset for Computational Linguistics and Political Analysis

Konstantina Dritsa, Kaiti Thoma, John Pavlopoulos, Panos Louridas

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a comprehensive Greek Parliament Proceedings dataset spanning over three decades, enabling advanced computational linguistics and political analysis, including studying language change and semantic shifts.

Contribution

It provides the first large-scale, diachronic Greek parliamentary dataset with extensive metadata, facilitating research in linguistics and political science.

Findings

01

Demonstrated the dataset's use in analyzing word usage changes over time

02

Showcased methods for detecting semantic shifts in political discourse

03

Enabled studies correlating language with historical and political events

Abstract

Large, diachronic datasets of political discourse are hard to come across, especially for resource-lean languages such as Greek. In this paper, we introduce a curated dataset of the Greek Parliament Proceedings that extends chronologically from 1989 up to 2020. It consists of more than 1 million speeches with extensive metadata, extracted from 5,355 parliamentary record files. We explain how it was constructed and the challenges that we had to overcome. The dataset can be used for both computational linguistics and political analysis-ideally, combining the two. We present such an application, showing (i) how the dataset can be used to study the change of word usage through time, (ii) between significant historical events and political parties, (iii) by evaluating and employing algorithms for detecting semantic shifts.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

dritsa-konstantina/greparl
noneOfficial

Videos

A Greek Parliament Proceedings Dataset for Computational Linguistics and Political Analysis· slideslive

Taxonomy

TopicsComputational and Text Analysis Methods · Language and cultural evolution · Natural Language Processing Techniques