Mecellem Models: Turkish Models Trained from Scratch and Continually Pre-trained for the Legal Domain

\"Ozg\"ur U\u{g}ur; Mahmut G\"oksu; Mahmut \c{C}imen; Musa Y{\i}lmaz; Esra \c{S}avirdi; Alp Talha Demir; Rumeysa G\"ull\"uce; \.Iclal \c{C}etin; \"Omer Can Sa\u{g}ba\c{s}

arXiv:2601.16018·cs.CL·January 23, 2026

Mecellem Models: Turkish Models Trained from Scratch and Continually Pre-trained for the Legal Domain

\"Ozg\"ur U\u{g}ur, Mahmut G\"oksu, Mahmut \c{C}imen, Musa Y{\i}lmaz, Esra \c{S}avirdi, Alp Talha Demir, Rumeysa G\"ull\"uce, \.Iclal \c{C}etin, \"Omer Can Sa\u{g}ba\c{s}

PDF

Open Access 9 Models 3 Datasets

TL;DR

This paper introduces Mecellem models, specialized Turkish legal language models developed via scratch pre-training and continual domain adaptation, achieving high retrieval performance and domain-specific understanding with efficient training strategies.

Contribution

It presents a novel framework for Turkish legal domain models, including a scratch-trained encoder with checkpoint selection and a continual pre-training decoder, both optimized for efficiency and performance.

Findings

01

Encoder models achieve top-3 Turkish retrieval leaderboard rankings.

02

The approach attains 92.36% production efficiency compared to state-of-the-art models.

03

Continual pre-training reduces perplexity by 36.2% on Turkish legal texts.

Abstract

This paper presents Mecellem models, a framework for developing specialized language models for the Turkish legal domain through domain adaptation strategies. We make two contributions: (1)Encoder Model Pre-trained from Scratch: ModernBERT-based bidirectional encoders pre-trained on a Turkish-dominant corpus of 112.7 billion tokens. We implement a checkpoint selection strategy that evaluates downstream retrieval performance throughout training, revealing that optimal checkpoints achieve best retrieval scores before pre-training loss reaches its minimum. Our encoder models achieve top-3 rankings on the Turkish retrieval leaderboard, with smaller models (155M parameters) achieving comparable performance to larger reference models (307M-567M parameters). Our approach achieves 92.36% production efficiency compared to state-of-the-art models (embeddinggemma-300m: 100.00%, BAAI/bge-m3:…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Artificial Intelligence in Law · Natural Language Processing Techniques