Entropy of Ukrainian

Anton Lavreniuk; Mykyta Mudryi; Markiian Chaklosh

arXiv:2604.27534·cs.CL·May 1, 2026

Entropy of Ukrainian

Anton Lavreniuk, Mykyta Mudryi, Markiian Chaklosh

PDF

TL;DR

This paper estimates the entropy of the Ukrainian language using a Shannon-inspired experiment with 184 volunteers, providing an upper bound and comparing it to large language models.

Contribution

First entropy measurement for Ukrainian using a Shannon-based experiment, with methods and code published for reproducibility.

Findings

01

Upper bound of Ukrainian entropy approximately 1.201 bits per character

02

Methods and code for entropy estimation are documented and publicly available

03

Comparison of Ukrainian entropy with performance of current large language models

Abstract

In natural language processing, the entropy of a language is a measure of its unpredictability and complexity. The first study on this subject was conducted by Claude Shannon in 1951. By having participants predict the next character in a sentence, he was able to approximate the entropy of the English language. Several follow-up studies by other authors have since been conducted for English, and one for Hebrew. However, to date, Shannon's experiment has never been conducted for Ukrainian. In this paper, we perform this experiment for Ukrainian by recruiting 184 volunteers using social media channels. We rely on techniques used for English to approximate the entropy value of Ukrainian. The final result is an upper bound of $H_{u pp er} \approx 1.201$ bits per character. We compare this to the performance of current Large Language Models. The methods and code used are also documented and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.