Introducing BEREL: BERT Embeddings for Rabbinic-Encoded Language
Avi Shmidman, Joshua Guedalia, Shaltiel Shmidman, Cheyn Shmuel, Shmidman, Eli Handel, Moshe Koppel

TL;DR
This paper introduces Berel, a specialized BERT-based language model designed specifically for Rabbinic Hebrew, outperforming existing models trained on modern Hebrew in processing Rabbinic texts.
Contribution
The paper presents a new pre-trained language model tailored for Rabbinic Hebrew, addressing the gap left by models trained on modern Hebrew texts.
Findings
Berel outperforms existing Hebrew PLMs on Rabbinic texts.
The model is validated using a homograph challenge set.
The model and challenge set are publicly released.
Abstract
We present a new pre-trained language model (PLM) for Rabbinic Hebrew, termed Berel (BERT Embeddings for Rabbinic-Encoded Language). Whilst other PLMs exist for processing Hebrew texts (e.g., HeBERT, AlephBert), they are all trained on modern Hebrew texts, which diverges substantially from Rabbinic Hebrew in terms of its lexicographical, morphological, syntactic and orthographic norms. We demonstrate the superiority of Berel on Rabbinic texts via a challenge set of Hebrew homographs. We release the new model and homograph challenge set for unrestricted use.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling
