Comparative Computational Analysis of Global Structure in Canonical, Non-Canonical and Non-Literary Texts
Mahdi Mohseni, Volker Gast, Christoph Redies

TL;DR
This study analyzes the global structural differences among canonical, non-canonical, and non-literary texts using multifractal analysis, revealing that low-level linguistic features better distinguish text types and that fractality is a universal property.
Contribution
It introduces a novel application of multifractal analysis to compare global structural properties across different text categories, highlighting the discriminative power of low-level features.
Findings
Low-level properties better discriminate text types.
Canonical texts differ mainly in variability from non-canonical texts.
Fractality is a universal feature, more pronounced in non-literary texts.
Abstract
This study investigates global properties of literary and non-literary texts. Within the literary texts, a distinction is made between canonical and non-canonical works. The central hypothesis of the study is that the three text types (non-literary, literary/canonical and literary/non-canonical) exhibit systematic differences with respect to structural design features as correlates of aesthetic responses in readers. To investigate these differences, we compiled a corpus containing texts of the three categories of interest, the Jena Textual Aesthetics Corpus. Two aspects of global structure are investigated, variability and self-similar (fractal) patterns, which reflect long-range correlations along texts. We use four types of basic observations, (i) the frequency of POS-tags per sentence, (ii) sentence length, (iii) lexical diversity in chunks of text, and (iv) the distribution of topic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
