Statistical patterns of word frequency suggesting the probabilistic nature of human languages
Shuiyuan Yu, Chunshan Xu, Haitao Liu

TL;DR
This paper presents evidence from authentic language data indicating that human languages are inherently probabilistic, with linguistic universals, diachronic drift, and variations explained through statistical patterns.
Contribution
It demonstrates that key linguistic phenomena can be modeled as probability and frequency patterns, supporting the view of language as a probabilistic system.
Findings
Linguistic universals can be expressed as probability patterns.
Language change over time reflects diachronic frequency shifts.
Language variations are explained by statistical differences.
Abstract
Traditional linguistic theories have largely regard language as a formal system composed of rigid rules. However, their failures in processing real language, the recent successes in statistical natural language processing, and the findings of many psychological experiments have suggested that language may be more a probabilistic system than a formal system, and thus cannot be faithfully modeled with the either/or rules of formal linguistic theory. The present study, based on authentic language data, confirmed that those important linguistic issues, such as linguistic universal, diachronic drift, and language variations can be translated into probability and frequency patterns in parole. These findings suggest that human language may well be probabilistic systems by nature, and that statistical may well make inherent properties of human languages.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Language and cultural evolution · Syntax, Semantics, Linguistic Variation
