Character Entropy in Modern and Historical Texts: Comparison Metrics for an Undeciphered Manuscript
Luke Lindemann, Claire Bowern

TL;DR
This study creates and analyzes multiple corpora to compare the Voynich manuscript's script characteristics, revealing its high predictability and unique constraints that differ from natural languages and other scripts.
Contribution
It introduces new multilingual corpora for Voynich analysis and demonstrates that Voynichese's predictability is not due to transcription or language, but likely due to script constraints.
Findings
Voynichese has high character predictability not explained by language or cipher.
Character placement constraints suggest loss of phonemic distinctions.
Corpora enable comparative analysis of script and language features.
Abstract
This paper outlines the creation of three corpora for multilingual comparison and analysis of the Voynich manuscript: a corpus of Voynich texts partitioned by Currier language, scribal hand, and transcription system, a corpus of 294 language samples compiled from Wikipedia, and a corpus of eighteen transcribed historical texts in eight languages. These corpora will be utilized in subsequent work by the Voynich Working Group at Yale University. We demonstrate the utility of these corpora for studying characteristics of the Voynich script and language, with an analysis of conditional character entropy in Voynichese. We discuss the interaction between character entropy and language, script size and type, glyph compositionality, scribal conventions and abbreviations, positional character variants, and bigram frequency. This analysis characterizes the interaction between script…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIntelligence, Security, War Strategy
