Lexical Base as a Compressed Language Model of the World (on the material of the Ukrainian language)
Solomiya Buk

TL;DR
This paper demonstrates that a statistically selected list of words in Ukrainian forms an interconnected lexical base covering all human activity areas, verified through a universal synoptical scheme.
Contribution
It introduces the concept of a lexical base as a system of interrelated words derived from statistical methods, applicable across languages.
Findings
Selected word list forms an interconnected system
The lexical base covers all human activity spheres
Invariant synoptical scheme confirms the system's universality
Abstract
In the article the fact is verified that the list of words selected by formal statistical methods (frequency and functional genre unrestrictedness) is not a conglomerate of non-related words. It creates a system of interrelated items and it can be named "lexical base of language". This selected list of words covers all the spheres of human activities. To verify this statement the invariant synoptical scheme common for ideographic dictionaries of different language was determined.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
