Knesset-DictaBERT: A Hebrew Language Model for Parliamentary Proceedings

Gili Goldin (1); Shuly Wintner (1) ((1) Department of Computer; Science; University of Haifa; Israel)

arXiv:2407.20581·cs.CL·July 31, 2024

Knesset-DictaBERT: A Hebrew Language Model for Parliamentary Proceedings

Gili Goldin (1), Shuly Wintner (1) ((1) Department of Computer, Science, University of Haifa, Israel)

PDF

Open Access 1 Models

TL;DR

Knesset-DictaBERT is a Hebrew language model tailored for parliamentary texts, showing enhanced understanding of parliamentary language through fine-tuning on Israeli Knesset proceedings.

Contribution

This paper introduces Knesset-DictaBERT, a novel Hebrew language model specifically fine-tuned on parliamentary data, improving NLP performance in this domain.

Findings

01

Significant perplexity reduction over baseline

02

Improved accuracy in MLM tasks

03

Enhanced understanding of parliamentary language

Abstract

We present Knesset-DictaBERT, a large Hebrew language model fine-tuned on the Knesset Corpus, which comprises Israeli parliamentary proceedings. The model is based on the DictaBERT architecture and demonstrates significant improvements in understanding parliamentary language according to the MLM task. We provide a detailed evaluation of the model's performance, showing improvements in perplexity and accuracy over the baseline DictaBERT model.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
GiliGold/Knesset-DictaBERT
model· 22 dl· ♡ 2
22 dl♡ 2

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Legal Language and Interpretation