Ordinal analysis of lexical patterns
David Sanchez, Luciano Zunino, Juan De Gregorio, Raul Toral, and Claudio Mirasso

TL;DR
This paper introduces an ordinal pattern analysis method to study lexical correlations across 11 languages, revealing language-specific structures and enabling historical and authorship attribution through pattern fluctuations.
Contribution
It presents a novel ordinal pattern approach to analyze lexical connections, highlighting language-specific distributions and applications in linguistics and stylometry.
Findings
Distinct pattern distributions for each language
Pattern fluctuations can identify historical periods
Ordinal analysis aids in authorship attribution
Abstract
Words are fundamental linguistic units that connect thoughts and things through meaning. However, words do not appear independently in a text sequence. The existence of syntactic rules induces correlations among neighboring words. Using an ordinal pattern approach, we present an analysis of lexical statistical connections for 11 major languages. We find that the diverse manners that languages utilize to express word relations give rise to unique pattern structural distributions. Furthermore, fluctuations of these pattern distributions for a given language can allow us to determine both the historical period when the text was written and its author. Taken together, our results emphasize the relevance of ordinal time series analysis in linguistic typology, historical linguistics and stylometry.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Text Analysis Techniques · Language and cultural evolution · Fractal and DNA sequence analysis
