The 'Letter' Distribution in the Chinese Language
Qinghua Chen, Yan Wang, Mengmeng Wang, Xiaomeng Li

TL;DR
This study investigates the statistical distribution of Chinese written language elements and compares them with alphabetic languages, revealing consistent distribution patterns across different historical periods and language types.
Contribution
It demonstrates that Chinese's constructive parts follow similar statistical laws as alphabetic languages, extending understanding of language universals.
Findings
Chinese constructive parts share distribution patterns with alphabetic languages
Distribution form remains consistent across Chinese historical periods
Basic particles' usage intensity varies over time
Abstract
Corpus-based statistical analysis plays a significant role in linguistic research, and ample evidence has shown that different languages exhibit some common laws. Studies have found that letters in some alphabetic writing languages have strikingly similar statistical usage frequency distributions. Does this hold for Chinese, which employs ideogram writing? We obtained letter frequency data of some alphabetic writing languages and found the common law of the letter distributions. In addition, we collected Chinese literature corpora for different historical periods from the Tang Dynasty to the present, and we dismantled the Chinese written language into three kinds of basic particles: characters, strokes and constructive parts. The results of the statistical analysis showed that, in different historical periods, the intensity of the use of basic particles in Chinese writing varied, but…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Syntax, Semantics, Linguistic Variation · Linguistic Variation and Morphology
