Hebrew Diacritics Restoration using Visual Representation

Yair Elboher; Yuval Pinter

arXiv:2510.26521·cs.CL·February 5, 2026

Hebrew Diacritics Restoration using Visual Representation

Yair Elboher, Yuval Pinter

PDF

1 Video

TL;DR

This paper introduces DiVRit, a novel Hebrew diacritization system that uses visual language models to process diacritized candidates as images, achieving high accuracy without relying on explicit linguistic analysis.

Contribution

The work presents a new zero-shot classification approach for Hebrew diacritization using visual language models to embed diacritics directly in vector representations.

Findings

01

High accuracy in oracle settings with correct diacritized forms among candidates

02

Effective diacritization without complex linguistic analysis

03

Significant improvements through architectural and training optimizations

Abstract

Diacritics restoration in Hebrew is a fundamental task for ensuring accurate word pronunciation and disambiguating textual meaning. Despite the language's high degree of ambiguity when unvocalized, recent machine learning approaches have significantly advanced performance on this task. In this work, we present DiVRit, a novel system for Hebrew diacritization that frames the task as a zero-shot classification problem. Our approach operates at the word level, selecting the most appropriate diacritization pattern for each undiacritized word from a dynamically generated candidate set, conditioned on the surrounding textual context. A key innovation of DiVRit is its use of a Hebrew Visual Language Model to process diacritized candidates as images, allowing diacritic information to be embedded directly within their vector representations while the surrounding context remains…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Hebrew Diacritics Restoration using Visual Representation· underline