Markov reads Pushkin, again: A statistical journey into the poetic world of Evgenij Onegin
Angelo Maria Sabatini

TL;DR
This paper employs symbolic time series analysis and Markov models to explore the phonological and structural patterns in Evgenij Onegin and its Italian translation, revealing asymmetries and thematic cues.
Contribution
It demonstrates that minimalist Markov models, combined with linguistic annotation, can effectively analyze complex poetic structures and support comparative poetics.
Findings
A four-state Markov chain accurately captures sequence features.
The Russian original shows a decline in memory depth, unlike the translation.
Phonological probes reveal connections between surface form and narrative cues.
Abstract
This study applies symbolic time series analysis and Markov modeling to explore the phonological structure of Evgenij Onegin-as captured through a graphemic vowel/consonant (V/C) encoding-and one contemporary Italian translation. Using a binary encoding inspired by Markov's original scheme, we construct minimalist probabilistic models that capture both local V/C dependencies and large-scale sequential patterns. A compact four-state Markov chain is shown to be descriptively accurate and generative, reproducing key features of the original sequences such as autocorrelation and memory depth. All findings are exploratory in nature and aim to highlight structural regularities while suggesting hypotheses about underlying narrative dynamics. The analysis reveals a marked asymmetry between the Russian and Italian texts: the original exhibits a gradual decline in memory depth, whereas the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
