Quantitative Entropy Study of Language Complexity
R.R. Xie, W.B. Deng, D.J. Wang, L.P. Csernai

TL;DR
This paper analyzes the entropy of Chinese and English texts to measure language complexity, revealing differences across languages, styles, and ages, with potential applications in personal and group linguistic analysis.
Contribution
It introduces a quantitative entropy-based method to compare language complexity across languages, styles, and demographic groups.
Findings
Chinese texts have different entropy levels than English.
Personal styles influence language entropy.
Entropy analysis can track language complexity over age and groups.
Abstract
We study the entropy of Chinese and English texts, based on characters in case of Chinese texts and based on words for both languages. Significant differences are found between the languages and between different personal styles of debating partners. The entropy analysis points in the direction of lower entropy, that is of higher complexity. Such a text analysis would be applied for individuals of different styles, a single individual at different age, as well as different groups of the population.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Text Analysis Techniques · Authorship Attribution and Profiling
