Hybrid topic modelling for computational close reading: Mapping narrative themes in Pushkin's Evgenij Onegin
Angelo Maria Sabatini

TL;DR
This paper introduces a hybrid computational framework combining LDA and sPLS-DA to analyze themes and narrative structure in Pushkin's Evgenij Onegin, demonstrating its effectiveness in small-corpus literary analysis.
Contribution
It develops a novel hybrid modeling approach that integrates unsupervised and supervised techniques for thematic analysis in poetic texts, addressing small-corpus stability and interpretability.
Findings
Identified five stable, interpretable themes in Evgenij Onegin.
Enhanced thematic interpretability using supervised lexical markers.
Mapped narrative arcs and emotional structure through thematic hubs.
Abstract
This study presents a hybrid topic modelling framework for computational literary analysis that integrates Latent Dirichlet Allocation (LDA) with sparse Partial Least Squares Discriminant Analysis (sPLS-DA) to model thematic structure and longitudinal dynamics in narrative poetry. As a case study, we analyse Evgenij Onegin-Aleksandr S. Pushkin's novel in verse-using an Italian translation, testing whether unsupervised and supervised lexical structures converge in a small-corpus setting. The poetic text is segmented into thirty-five documents of lemmatised content words, from which five stable and interpretable topics emerge. To address small-corpus instability, a multi-seed consensus protocol is adopted. Using sPLS-DA as a supervised probe enhances interpretability by identifying lexical markers that refine each theme. Narrative hubs-groups of contiguous stanzas marking key…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAuthorship Attribution and Profiling · Topic Modeling · Sentiment Analysis and Opinion Mining
