Restoring Hebrew Diacritics Without a Dictionary
Elazar Gershuni, Yuval Pinter

TL;DR
This paper introduces NAKDIMON, a two-layer character-level LSTM that can restore Hebrew diacritics without relying on curated resources, achieving comparable performance to complex systems.
Contribution
The paper presents a simple, resource-free neural model for Hebrew diacritization that matches the accuracy of more complex, resource-dependent systems.
Findings
NAKDIMON performs on par with curated systems
Diacritization achieved without human-curated resources
Effective across diverse Hebrew sources
Abstract
We demonstrate that it is feasible to diacritize Hebrew script without any human-curated resources other than plain diacritized text. We present NAKDIMON, a two-layer character level LSTM, that performs on par with much more complicated curation-dependent systems, across a diverse array of modern Hebrew sources.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Handwritten Text Recognition Techniques · Digital Humanities and Scholarship
MethodsAdam · Sigmoid Activation · Tanh Activation · Long Short-Term Memory
