Statistical analysis of the Indus script using $n$-grams
Nisha Yadav, Hrishikesh Joglekar, Rajesh P. N. Rao, M. N. Vahia,, Iravatham Mahadevan, R. Adhikari

TL;DR
This study applies $n$-gram Markov chains to analyze the Indus script, revealing structured syntax features and intermediate information content, providing insights into its possible linguistic nature without definitive decoding.
Contribution
It introduces a statistical language processing approach using $n$-grams to analyze the Indus script's syntax, highlighting structured patterns without assuming semantic content.
Findings
Signs have clear start and end markers.
There is directionality and sign order correlation.
Signs form groups with similar syntactic functions.
Abstract
The Indus script is one of the major undeciphered scripts of the ancient world. The small size of the corpus, the absence of bilingual texts, and the lack of definite knowledge of the underlying language has frustrated efforts at decipherment since the discovery of the remains of the Indus civilisation. Recently, some researchers have questioned the premise that the Indus script encodes spoken language. Building on previous statistical approaches, we apply the tools of statistical language processing, specifically -gram Markov chains, to analyse the Indus script for syntax. Our main results are that the script has well-defined signs which begin and end texts, that there is directionality and strong correlations in the sign order, and that there are groups of signs which appear to have identical syntactic function. All these require no {\it a priori} suppositions regarding the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
