Statistical Patterns in Written Language
Dami\'an H. Zanette

TL;DR
This paper reviews recent advances in quantitative linguistics, highlighting how statistical and information-theoretic methods reveal medium- and long-range patterns in human language, offering new insights into its complex structure.
Contribution
It provides a comprehensive overview of recent findings in the application of statistical physics and information theory to analyze language patterns beyond traditional linguistics.
Findings
Identification of medium- and long-range organizational features in language
Application of statistical and information-theoretic techniques to linguistic data
Emergence of regularities and correlations in language streams
Abstract
Quantitative linguistics has been allowed, in the last few decades, within the admittedly blurry boundaries of the field of complex systems. A growing host of applied mathematicians and statistical physicists devote their efforts to disclose regularities, correlations, patterns, and structural properties of language streams, using techniques borrowed from statistics and information theory. Overall, results can still be categorized as modest, but the prospects are promising: medium- and long-range features in the organization of human language -which are beyond the scope of traditional linguistics- have already emerged from this kind of analysis and continue to be reported, contributing a new perspective to our understanding of this most complex communication system. This short book is intended to review some of these recent contributions.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFractal and DNA sequence analysis · Machine Learning in Bioinformatics · Algorithms and Data Compression
