Rethinking the Relationship between the Power Law and Hierarchical Structures
Kai Nakaishi, Ryo Yoshida, Kohei Kajikawa, Koji Hukushima, Yohei Oseki

TL;DR
This study critically examines the assumption that power-law distributions in language corpora directly indicate hierarchical syntactic structures, finding that the assumptions do not hold for natural language parse trees.
Contribution
It empirically tests the relationship between power laws and hierarchical syntax, challenging previous interpretations and suggesting the need for a revised understanding.
Findings
Power-law decay of correlations does not align with syntactic structures.
Assumptions linking power laws to hierarchical syntax are not supported by data.
Reconsideration of the relationship between power laws and language structure is necessary.
Abstract
Statistical analysis of corpora provides an approach to quantitatively investigate natural languages. This approach has revealed that several power laws consistently emerge across different corpora and languages, suggesting universal mechanisms underlying languages. In particular, the power-law decay of correlations has been interpreted as evidence of underlying hierarchical structures in syntax, semantics, and discourse. This perspective has also been extended beyond corpora produced by human adults, including child speech, birdsong, and chimpanzee action sequences. However, the argument supporting this interpretation has not been empirically tested in natural languages. To address this gap, the present study examines the validity of the argument for syntactic structures. Specifically, we test whether the statistical properties of parse trees align with the assumptions in the argument.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSyntax, Semantics, Linguistic Variation · Language Development and Disorders · Language and cultural evolution
MethodsALIGN
