Entropic analysis of the role of words in literary texts

Marcelo A. Montemurro; Damian H. Zanette

arXiv:cond-mat/0109218·cond-mat.stat-mech·May 23, 2007·5 cites

Entropic analysis of the role of words in literary texts

Marcelo A. Montemurro, Damian H. Zanette

PDF

Open Access

TL;DR

This paper investigates how the statistical properties of words in literary texts relate to their linguistic roles, revealing patterns through entropy analysis without relying on syntactic structures.

Contribution

It introduces an entropy-based method to analyze word roles in literary texts, enabling clustering without prior syntactic knowledge.

Findings

01

Content words show a quantifiable relation to Shannon entropy.

02

Words can be clustered based on their roles without syntactic assumptions.

03

Statistical regularities reflect linguistic functions in literature.

Abstract

Beyond the local constraints imposed by grammar, words concatenated in long sequences carrying a complex message show statistical regularities that may reflect their linguistic role in the message. In this paper, we perform a systematic statistical analysis of the use of words in literary English corpora. We show that there is a quantitative relation between the role of content words in literary English and the Shannon information entropy defined over an appropriate probability distribution. Without assuming any previous knowledge about the syntactic structure of language, we are able to cluster certain groups of words according to their specific role in the text.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFractal and DNA sequence analysis