How Non-Linguistic Is the Indus Sign System? A Synthetic-Baseline Scorecard

Ashish Nair

arXiv:2604.17828·cs.CL·April 21, 2026

How Non-Linguistic Is the Indus Sign System? A Synthetic-Baseline Scorecard

Ashish Nair

PDF

1 Repo

TL;DR

This study evaluates whether the Indus sign system exhibits linguistic properties by comparing it to non-linguistic baselines, revealing it does not fully match either, suggesting a complex nature.

Contribution

Introduces a multi-metric framework to assess linguistic versus non-linguistic features in the Indus corpus, providing a novel quantitative analysis.

Findings

01

Indus corpus does not match either non-linguistic baseline.

02

Indus signs occupy an intermediate position between the two baselines.

03

No real-world non-linguistic corpus fully reproduces Indus statistical profile.

Abstract

Whether the Indus Valley sign system (c. 2600-1900 BCE) encodes spoken language has been debated for decades. This paper introduces a multi-metric discrimination framework that tests the observed Indus corpus against two kinds of computer-generated non-linguistic baseline -- one mimicking a heraldic emblem system, the other an administrative coding system -- each calibrated with Zipfian frequency distributions, positional constraints, and bigram dependencies derived from six attested non-linguistic corpora. The scorecard evaluates four properties central to the Farmer-Sproat-Witzel (2004) critique: text brevity, repeated formulaic phrases, hapax legomenon rate, and positional rigidity. Applying this framework to 1,916 deduplicated inscriptions (584 unique signs, 11,110 tokens) from the ICIT/Yajnadevam digitization, we find that the Indus corpus does not match either baseline cleanly.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

null
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.