Text authorship identified using the dynamics of word co-occurrence networks
Camilo Akimushkin, Diego R. Amancio, Osvaldo N. Oliveira Jr

TL;DR
This paper presents a novel method for authorship attribution using the dynamics of word co-occurrence networks, achieving high accuracy in classifying texts by authors through network metric analysis and machine learning.
Contribution
The study introduces a new network-based approach for authorship identification that leverages the dynamics of co-occurrence networks and supervised learning, improving classification efficiency.
Findings
85% correct classification rate on 80 texts
Stationarity of network metric time series validated
Dynamic network metrics effectively characterize authorship
Abstract
The identification of authorship in disputed documents still requires human expertise, which is now unfeasible for many tasks owing to the large volumes of text and authors in practical applications. In this study, we introduce a methodology based on the dynamics of word co-occurrence networks representing written texts to classify a corpus of 80 texts by 8 authors. The texts were divided into sections with equal number of linguistic tokens, from which time series were created for 12 topological metrics. The series were proven to be stationary (p-value>0.05), which permits to use distribution moments as learning attributes. With an optimized supervised learning procedure using a Radial Basis Function Network, 68 out of 80 texts were correctly classified, i.e. a remarkable 85% author matching success rate. Therefore, fluctuations in purely dynamic network metrics were found to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
