Style Classification of Rabbinic Literature for Detection of Lost Midrash Tanhuma Material
Shlomo Tannor, Nachum Dershowitz, Moshe Lavee

TL;DR
This paper introduces a style-based classification system for rabbinic texts using NLP, aiding scholars in identifying origins and uncovering lost material in Midrash collections, especially Tanḥuma-Yelammedenu.
Contribution
It presents a novel NLP approach for classifying rabbinic literature by style, facilitating detection of lost Midrash Tanhuma material and improving text attribution.
Findings
Successfully classified rabbinic texts by style.
Uncovered previously lost material in Tanḥuma-Yelammedenu.
Enhanced understanding of rabbinic text origins.
Abstract
Midrash collections are complex rabbinic works that consist of text in multiple languages, which evolved through long processes of unstable oral and written transmission. Determining the origin of a given passage in such a compilation is not always straightforward and is often a matter of dispute among scholars, yet it is essential for scholars' understanding of the passage and its relationship to other texts in the rabbinic corpus. To help solve this problem, we propose a system for classification of rabbinic literature based on its style, leveraging recent advances in natural language processing for Hebrew texts. Additionally, we demonstrate how this method can be applied to uncover lost material from a specific midrash genre, Tan\d{h}uma-Yelammedenu, that has been preserved in later anthologies.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiblical Studies and Interpretation · Archaeology and Historical Studies · Historical and Linguistic Studies
MethodsMulti-Head Attention · Attention Is All You Need · Position-Wise Feed-Forward Layer · Label Smoothing · Residual Connection · Linear Layer · Adam · Layer Normalization · Softmax · Absolute Position Encodings
