Probing the statistical properties of unknown texts: application to the   Voynich Manuscript

Diego R. Amancio; Eduardo G. Altmann; Diego Rybski; Osvaldo N.; Oliveira Jr.; Luciano da F. Costa

arXiv:1303.0347·physics.soc-ph·July 4, 2013

Probing the statistical properties of unknown texts: application to the Voynich Manuscript

Diego R. Amancio, Eduardo G. Altmann, Diego Rybski, Osvaldo N., Oliveira Jr., Luciano da F. Costa

PDF

TL;DR

This paper introduces a statistical framework to analyze unknown texts, like the Voynich Manuscript, determining their language compatibility and identifying key words without understanding their meaning.

Contribution

It proposes a novel multi-faceted statistical approach to assess the natural language properties of texts, applicable even to undeciphered manuscripts.

Findings

01

Voynich Manuscript is compatible with natural languages

02

Statistical measurements can distinguish real texts from shuffled versions

03

Identified candidate key-words for the Voynich Manuscript

Abstract

While the use of statistical physics methods to analyze large corpora has been useful to unveil many patterns in texts, no comprehensive investigation has been performed investigating the properties of statistical measurements across different languages and texts. In this study we propose a framework that aims at determining if a text is compatible with a natural language and which languages are closest to it, without any knowledge of the meaning of the words. The approach is based on three types of statistical measurements, i.e. obtained from first-order statistics of word properties in a text, from the topology of complex networks representing text, and from intermittency concepts where text is treated as a time series. Comparative experiments were performed with the New Testament in 15 different languages and with distinct books in English and Portuguese in order to quantify the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.