Entropic analysis of the role of words in literary texts
Marcelo A. Montemurro, Damian H. Zanette

TL;DR
This paper investigates how the statistical properties of words in literary texts relate to their linguistic roles, revealing patterns through entropy analysis without relying on syntactic structures.
Contribution
It introduces an entropy-based method to analyze word roles in literary texts, enabling clustering without prior syntactic knowledge.
Findings
Content words show a quantifiable relation to Shannon entropy.
Words can be clustered based on their roles without syntactic assumptions.
Statistical regularities reflect linguistic functions in literature.
Abstract
Beyond the local constraints imposed by grammar, words concatenated in long sequences carrying a complex message show statistical regularities that may reflect their linguistic role in the message. In this paper, we perform a systematic statistical analysis of the use of words in literary English corpora. We show that there is a quantitative relation between the role of content words in literary English and the Shannon information entropy defined over an appropriate probability distribution. Without assuming any previous knowledge about the syntactic structure of language, we are able to cluster certain groups of words according to their specific role in the text.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFractal and DNA sequence analysis
