The Knesset Corpus: An Annotated Corpus of Hebrew Parliamentary Proceedings
Gili Goldin (1), Nick Howell (2), Noam Ordan (2), Ella Rabinovich (3), Shuly Wintner (1) ((1) Department of Computer Science, University of Haifa, Israel, (2) IAHLT, Israel, (3) School of Computer Science, The Academic College of Tel-Aviv Yaffo, Israel)

TL;DR
The Knesset Corpus is a comprehensive, annotated dataset of Hebrew parliamentary proceedings from 1998 to 2022, enabling diverse research in linguistics, political science, and social studies.
Contribution
This paper introduces a large, annotated corpus of Hebrew parliamentary data with demographic and political metadata, facilitating new research opportunities.
Findings
Lexical richness decreased over time in proceedings
Gender-based stylistic differences observed among speakers
Corpus supports interdisciplinary research in social sciences
Abstract
We present the Knesset Corpus, a corpus of Hebrew parliamentary proceedings containing over 30 million sentences (over 384 million tokens) from all the (plenary and committee) protocols held in the Israeli parliament between 1998 and 2022. Sentences are annotated with morpho-syntactic information and are associated with detailed meta-information reflecting demographic and political properties of the speakers, based on a large database of parliament members and factions that we compiled. We discuss the structure and composition of the corpus and the various processing steps we applied to it. To demonstrate the utility of this novel dataset we present two use cases. We show that the corpus can be used to examine historical developments in the style of political discussions by showing a reduction in lexical richness in the proceedings over time. We also investigate some differences between…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
