A model of language inflection graphs
Henryk Fuk\'s, Babak Farzad, Yi Cao

TL;DR
This paper models the complex network of inflectional forms in languages like Latin and Polish using bipartite graphs, revealing structural properties similar to percolation phenomena and proposing a simple generative model.
Contribution
It introduces the simplest bipartite graph model of inflection graphs and analyzes their component structure, connecting linguistic inflection networks to percolation theory.
Findings
Distribution of word group sizes resembles lattice percolation near criticality
Proposed model reproduces key topological features of real inflection graphs
Connected components in the projection form a scale-like distribution
Abstract
Inflection graphs are highly complex networks representing relationships between inflectional forms of words in human languages. For so-called synthetic languages, such as Latin or Polish, they have particularly interesting structure due to abundance of inflectional forms. We construct the simplest form of inflection graphs, namely a bipartite graph in which one group of vertices corresponds to dictionary headwords and the other group to inflected forms encountered in a given text. We then study projection of this graph on the set of headwords. The projection decomposes into a large number of connected components, to be called word groups. Distribution of sizes of word group exhibits some remarkable properties, resembling cluster distribution in a lattice percolation near the critical point. We propose a simple model which produces graphs of this type, reproducing the desired component…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression
