Swap distance minimization beyond entropy minimization in word order variation
V\'ictor Franco-S\'anchez, Arnau Mart\'i-Llobet, Ramon Ferrer-i-Cancho

TL;DR
This paper explores how natural language word order is influenced by entropy minimization and swap distance minimization, introducing a new metric and providing evidence for both principles across different linguistic structures.
Contribution
It introduces average swap distance as a novel measure and demonstrates its significance alongside entropy minimization in shaping word order frequencies.
Findings
Strong evidence for entropy and swap distance minimization in linguistic data
Average swap distance effectively captures word order preferences
Swap distance minimization effects persist beyond entropy considerations
Abstract
Consider a linguistic structure formed by elements, for instance, subject, direct object and verb () or subject, direct object, indirect object and verb (). We investigate whether the frequency of the possible orders is constrained by two principles. First, entropy minimization, a principle that has been suggested to shape natural communication systems at distinct levels of organization. Second, swap distance minimization, namely a preference for word orders that require fewer swaps of adjacent elements to be produced from a source order. We present average swap distance, a novel score for research on swap distance minimization. We find strong evidence of pressure for entropy minimization and swap distance minimization with respect to a die rolling experiment in distinct linguistic structures with or . Evidence with respect to a Polya urn process is strong…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Natural Language Processing Techniques · Speech Recognition and Synthesis
