TL;DR
This paper introduces a new score to measure how optimally human languages arrange words to minimize dependency distances, revealing that many languages are highly optimized and confirming theoretical predictions about sentence length and dependency length.
Contribution
It presents a novel optimization score for syntactic dependency distances and applies it across 93 languages, providing a hierarchical ranking of language optimization levels.
Findings
Half of the languages are optimized to 70% or more.
Longer sentences tend to be more optimized.
Distances are often longer than expected in short sentences.
Abstract
It is often stated that human languages, as other biological systems, are shaped by cost-cutting pressures but, to what extent? Attempts to quantify the degree of optimality of languages by means of an optimality score have been scarce and focused mostly on English. Here we recast the problem of the optimality of the word order of a sentence as an optimization problem on a spatial network where the vertices are words, arcs indicate syntactic dependencies and the space is defined by the linear order of the words in the sentence. We introduce a new score to quantify the cognitive pressure to reduce the distance between linked words in a sentence. The analysis of sentences from 93 languages representing 19 linguistic families reveals that half of languages are optimized to a 70% or more. The score indicates that distances are not significantly reduced in a few languages and confirms two…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
