Comparing intermittency and network measurements of words and their dependency on authorship
Diego R. Amancio, Eduardo G. Altmann, Osvaldo N. Oliveira Jr., Luciano, da F. Costa

TL;DR
This study investigates how topological properties of word co-occurrence networks and intermittency in word distribution vary with author style, using statistical and network analysis on 40 books, and explores their use in authorship recognition.
Contribution
It quantifies the dependency of network and intermittency features on authorship and evaluates their effectiveness in authorship attribution using machine learning.
Findings
Skewness in word intermittency distribution strongly depends on authorship.
Average shortest path length in networks varies significantly with author.
Combining network and intermittency features yields about 65% accuracy in authorship recognition.
Abstract
Many features from texts and languages can now be inferred from statistical analyses using concepts from complex networks and dynamical systems. In this paper we quantify how topological properties of word co-occurrence networks and intermittency (or burstiness) in word distribution depend on the style of authors. Our database contains 40 books from 8 authors who lived in the 19th and 20th centuries, for which the following network measurements were obtained: clustering coefficient, average shortest path lengths, and betweenness. We found that the two factors with stronger dependency on the authors were the skewness in the distribution of word intermittency and the average shortest paths. Other factors such as the betweeness and the Zipf's law exponent show only weak dependency on authorship. Also assessed was the contribution from each measurement to authorship recognition using three…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
