Tokenizer Fertility and Zero-Shot Performance of Foundation Models on Ukrainian Legal Text: A Comparative Study

Volodymyr Ovcharov

arXiv:2605.14890·cs.CL·May 19, 2026

Tokenizer Fertility and Zero-Shot Performance of Foundation Models on Ukrainian Legal Text: A Comparative Study

Volodymyr Ovcharov

PDF

3 Models 4 Datasets

TL;DR

This study benchmarks foundation models on Ukrainian legal text, revealing tokenizer fertility's impact on cost and performance, and highlights the importance of domain-specific analysis for zero-shot NLP tasks.

Contribution

It provides a comparative analysis of seven models on Ukrainian legal data, introduces a new dataset, and offers insights into tokenizer efficiency and zero-shot transfer in a low-resource language.

Findings

01

Qwen 3 models are more token-efficient than Llama models.

02

NVIDIA Nemotron Super 3 outperforms larger models at lower cost.

03

Few-shot prompting reduces performance on Ukrainian legal tasks.

Abstract

Tokenizer fertility varies 1.6x across foundation models on Ukrainian legal text, yet this cost-critical dimension is absent from model selection practice. We benchmark seven models from five providers on 273 validated court decisions from Ukraine's state registry (EDRSR), measuring tokenizer fertility and zero-shot performance on three tasks. Four findings emerge. (1) Qwen 3 models consume 60% more tokens than Llama-family models on identical input, making tokenizer analysis a prerequisite for cost-efficient deployment. (2) NVIDIA Nemotron Super 3 (120B) achieves the highest composite score (83.1), outperforming Mistral Large 3 (5.6x more total parameters) at one-third the API cost model scale is a poor proxy for domain performance. (3) Few-shot prompting degrades performance by up to 26 percentage points; stratified and prompt-sensitivity ablations confirm this is intrinsic to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.