# Literal Pattern Analysis of Texts Written with the Multiple Form of Characters: A Comparative Study of the Human and Machine Styles

**Authors:** Kazuya Hayata

PMC · DOI: 10.3390/e28010036 · 2025-12-27

## TL;DR

This paper analyzes how Japanese texts mix three writing systems and compares human and machine translations using statistical measures.

## Contribution

A novel binary pattern approach is introduced to quantify character mixing in Japanese texts for stylometric analysis.

## Key findings

- Machine-translated texts show higher entropy than human translations in character mixing.
- The method identifies three clusters among 17 Japanese translations based on entropy.
- The approach is applicable to diverse Japanese texts, including the periodic table.

## Abstract

Aside from languages having no form of written expression, it is usually the case with every language on this planet that texts are written in a single character. But every rule has its exceptions. A very rare exception is Japanese, the texts of which are written in the three kinds of characters. In European languages, no one can find a text written in a mixture of the Latin, Cyrillic, and Greek alphabets. For several Japanese texts currently available, we conduct a quantitative analysis of how the three characters are mixed using a methodology based on a binary pattern approach to the sequence that has been generated by a procedure. Specifically, we consider two different texts in the former and present constitutions as well as a famous American story that has been translated at least 13 times into Japanese. For the latter, a comparison is made among the human translations and four machine translations by DeepL and Google Translate. As metrics of divergence and diversity, the Hellinger distance, chi-square value, normalized Shannon entropy, and Simpson’s diversity index are employed. Numerical results suggest that in terms of the entropy, the 17 translations consist of three clusters, and that overall, the machine-translated texts exhibit entropy higher than the human translations. The finding suggests that the present method can provide a tool useful for stylometry and author attribution. Finally, through comparison with the diversity index, capabilities of the entropic measure are confirmed. Lastly, in addition to the abovementioned texts, applicability to the Japanese version of the periodic table of elements is investigated.

## Full-text entities

- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

18 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12840514/full.md

---
Source: https://tomesphere.com/paper/PMC12840514