Literal Pattern Analysis of Texts Written with the Multiple Form of Characters: A Comparative Study of the Human and Machine Styles
Kazuya Hayata

TL;DR
This paper analyzes how Japanese texts mix three writing systems and compares human and machine translations using statistical measures.
Contribution
A novel binary pattern approach is introduced to quantify character mixing in Japanese texts for stylometric analysis.
Findings
Machine-translated texts show higher entropy than human translations in character mixing.
The method identifies three clusters among 17 Japanese translations based on entropy.
The approach is applicable to diverse Japanese texts, including the periodic table.
Abstract
Aside from languages having no form of written expression, it is usually the case with every language on this planet that texts are written in a single character. But every rule has its exceptions. A very rare exception is Japanese, the texts of which are written in the three kinds of characters. In European languages, no one can find a text written in a mixture of the Latin, Cyrillic, and Greek alphabets. For several Japanese texts currently available, we conduct a quantitative analysis of how the three characters are mixed using a methodology based on a binary pattern approach to the sequence that has been generated by a procedure. Specifically, we consider two different texts in the former and present constitutions as well as a famous American story that has been translated at least 13 times into Japanese. For the latter, a comparison is made among the human translations and four…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11
Figure 12
Figure 13
Figure 14
Figure 15
Figure 16
Figure 17
Figure 18Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAuthorship Attribution and Profiling · Text Readability and Simplification · Biomedical Text Mining and Ontologies
