The challenges of statistical patterns of language: the case of Menzerath's law in genomes
Ramon Ferrer-i-Cancho, N\'uria Forns, Antoni Hern\'andez-Fern\'andez,, Gemma Bel-Enguix, Jaume Baixeries

TL;DR
This paper examines Menzerath's law across language, music, and genomes, addressing criticisms and highlighting its non-inevitability and parallels with non-coding DNA, suggesting broader implications for understanding statistical patterns.
Contribution
It provides a detailed analysis of Menzerath's law in genomes, challenging previous criticisms and exploring its relevance to linguistic and genomic structures.
Findings
Menzerath's law is not inevitable in genomes.
Languages also contain non-coding DNA equivalents.
Statistical regularities may have broader biological and linguistic significance.
Abstract
The importance of statistical patterns of language has been debated over decades. Although Zipf's law is perhaps the most popular case, recently, Menzerath's law has begun to be involved. Menzerath's law manifests in language, music and genomes as a tendency of the mean size of the parts to decrease as the number of parts increases in many situations. This statistical regularity emerges also in the context of genomes, for instance, as a tendency of species with more chromosomes to have a smaller mean chromosome size. It has been argued that the instantiation of this law in genomes is not indicative of any parallel between language and genomes because (a) the law is inevitable and (b) non-coding DNA dominates genomes. Here mathematical, statistical and conceptual challenges of these criticisms are discussed. Two major conclusions are drawn: the law is not inevitable and languages also…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
