Strahler Number of Natural Language Sentences in Comparison with Random Trees
Kumiko Tanaka-Ishii, Akira Tanaka

TL;DR
This paper investigates the Strahler number in natural language sentences, showing it is typically 3 or 4, relates to memory limits in processing, and compares it to random trees, revealing insights into sentence complexity.
Contribution
It introduces the computation of upper and lower bounds of the Strahler number for sentence structures and links it to cognitive and processing constraints.
Findings
Strahler number of sentences is usually 3 or 4.
It grows logarithmically with sentence length.
No significant difference between natural language and random trees.
Abstract
The Strahler number was originally proposed to characterize the complexity of river bifurcation and has found various applications. This article proposes computation of the Strahler number's upper and lower limits for natural language sentence tree structures. Through empirical measurements across grammatically annotated data, the Strahler number of natural language sentences is shown to be almost 3 or 4, similarly to the case of river bifurcation as reported by Strahler (1957). From the theory behind the number, we show that it is one kind of lower limit on the amount of memory required to process sentences. We consider the Strahler number to provide reasoning that explains reports showing that the number of required memory areas to process sentences is 3 to 4 for parsing (Schuler et al., 2010), and reports indicating a psychological "magical number" of 3 to 5 (Cowan, 2001). An…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Language and cultural evolution · Topic Modeling
