# The Amount of Data Required to Recognize a Writer’s Style Is Consistent Across Different Languages of the World

**Authors:** Boris Ryabko, Nadezhda Savina, Yeshewas Getachew Lulu, Yunfei Han

PMC · DOI: 10.3390/e27101039 · Entropy · 2025-10-04

## TL;DR

This study shows that the amount of text needed to identify a writer's style is similar across different languages, including Russian, Amharic, Chinese, and English.

## Contribution

The study demonstrates that the data required for author recognition is consistent across diverse language groups.

## Key findings

- The amount of data needed to recognize an author's style is nearly the same across four languages.
- The RS-method was successfully applied to fiction texts in Russian, Amharic, Chinese, and English.
- The results are relevant to computer science, literary studies, and computational linguistics.

## Abstract

In this paper, we apply an information-theoretic method proposed by Ryabko and Savina (therefore called the RS-method), based on the use of data compression, to recognize the individual author’s style of a writer across four languages from different language groups and families. In this paper, the presented method was used to study fiction texts in Russian (East Slavic group of languages of the Indo-European language family), Amharic (South Ethiosemitic group of the Semitic language family), Chinese (Sinitic group of the Sino-Tibetan language family) and English (West Germanic language group of the Indo-European language family). It was found that the amount of data necessary for recognizing an author’s style is almost the same for all four languages, i.e., the amount of data is invariant across different language groups. The results obtained are of interest to computer science, literary studies, linguistics and, in particular, computational linguistics.

## Full-text entities

- **Diseases:** injury to (MESH:D014947)
- **Chemicals:** Amharic (-)
- **Species:** Viruses (acellular root) [taxon 10239], Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12563418/full.md

## References

22 references — full list in the complete paper: https://tomesphere.com/paper/PMC12563418/full.md

---
Source: https://tomesphere.com/paper/PMC12563418