Dicta-LM 3.0: Advancing The Frontier of Hebrew Sovereign LLMs

Shaltiel Shmidman; Avi Shmidman; Amir DN Cohen; Moshe Koppel

arXiv:2602.02104·cs.CL·February 3, 2026

Dicta-LM 3.0: Advancing The Frontier of Hebrew Sovereign LLMs

Shaltiel Shmidman, Avi Shmidman, Amir DN Cohen, Moshe Koppel

PDF

Open Access

TL;DR

Dicta-LM 3.0 introduces open-weight Hebrew LLMs in multiple sizes with extended context lengths, evaluated on a new Hebrew benchmark suite, advancing multilingual NLP for low-resource languages.

Contribution

The paper presents a new collection of Hebrew LLMs with extended context and tool support, along with a comprehensive benchmark suite for evaluation, addressing low-resource language challenges.

Findings

01

Models achieve competitive performance on Hebrew NLP tasks.

02

Extended context length improves task handling.

03

Benchmark suite enables rigorous evaluation of Hebrew LLMs.

Abstract

Open-weight LLMs have been released by frontier labs; however, sovereign Large Language Models (for languages other than English) remain low in supply yet high in demand. Training large language models (LLMs) for low-resource languages such as Hebrew poses unique challenges. In this paper, we introduce Dicta-LM 3.0: an open-weight collection of LLMs trained on substantially-sized corpora of Hebrew and English texts. The model is released in three sizes: 24B - adapted from the Mistral-Small-3.1 base model, 12B - adapted from the NVIDIA Nemotron Nano V2 model, and 1.7B - adapted from the Qwen3-1.7B base model. We are releasing multiple variants of each model, each with a native context length of 65k tokens; base model and chat model with tool-calling support. To rigorously evaluate our models, we introduce a new benchmark suite for evaluation of Hebrew chat-LLMs, covering a diverse set of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification