A Greek Government Decisions Dataset for Public-Sector Analysis and Insight
Giorgos Antoniou, Giorgos Filandrianos, Aggelos Vlachos, Giorgos Stamou, Lampros Kollimenos, Konstantinos Skianis, Michalis Vazirgiannis

TL;DR
This paper presents a large-scale, high-quality Greek government decisions dataset, along with analysis and a retrieval-augmented generation system, to enhance public-sector transparency, information access, and support for language models in governmental domains.
Contribution
It introduces a comprehensive, machine-readable corpus of Greek government decisions, a reproducible extraction pipeline, and evaluates a RAG system for question answering over public decisions.
Findings
The dataset contains 1 million decisions with high-quality raw text.
A baseline RAG system effectively retrieves and reasons over government documents.
The corpus supports training and fine-tuning language models for legal and governmental tasks.
Abstract
We introduce an open, machine-readable corpus of Greek government decisions sourced from the national transparency platform Diavgeia. The resource comprises 1 million decisions, featuring and high-quality raw text extracted from PDFs. It is released with raw extracted text in Markdown format, alongside a fully reproducible extraction pipeline. Beyond the core dataset, we conduct qualitative analyses to explore boilerplate patterns and design a retrieval-augmented generation (RAG) task by formulating a set of representative questions, creating high-quality answers, and evaluating a baseline RAG system on its ability to retrieve and reason over public decisions. This evaluation demonstrates the potential of large-scale public-sector corpora to support advanced information access and transparency through structured retrieval and reasoning over governmental documents, and highlights how…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational and Text Analysis Methods · Topic Modeling · Text Readability and Simplification
