A Language and Its Dimensions: Intrinsic Dimensions of Language Fractal Structures
Vasilii A. Gromov, Nikita S. Borodin, and Asel S. Yerbolova

TL;DR
This paper introduces the concept of language fractal structures, estimating their intrinsic dimensions for Russian and English using topological data analysis and minimum spanning trees, revealing non-integer dimensions close to 9.
Contribution
It proposes a novel framework for analyzing language as fractal structures and estimates their intrinsic dimensions, a new approach in linguistic complexity analysis.
Findings
Intrinsic dimensions are non-integer, close to 9 for both languages.
Language fractal structures can be characterized by topological and graph-based methods.
Russian and English share similar fractal dimension properties.
Abstract
The present paper introduces a novel object of study - a language fractal structure. We hypothesize that a set of embeddings of all -grams of a natural language constitutes a representative sample of this fractal set. (We use the term Hailonakea to refer to the sum total of all language fractal structures, over all ). The paper estimates intrinsic (genuine) dimensions of language fractal structures for the Russian and English languages. To this end, we employ methods based on (1) topological data analysis and (2) a minimum spanning tree of a data graph for a cloud of points considered (Steele theorem). For both languages, for all , the intrinsic dimensions appear to be non-integer values (typical for fractal sets), close to 9 for both of the Russian and English language.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopological and Geometric Data Analysis
MethodsSparse Evolutionary Training
