# On the role of words in the network structure of texts: application to   authorship attribution

**Authors:** Camilo Akimushkin, Diego R. Amancio, Osvaldo N. Oliveira Jr

arXiv: 1705.04187 · 2018-02-27

## TL;DR

This paper introduces a new similarity measure combining network structure and word roles for texts, significantly improving authorship attribution accuracy over traditional frequency-based methods.

## Contribution

It proposes a generalized similarity measure that integrates network topology and word importance, enhancing authorship attribution performance.

## Key findings

- Achieved 90-98.75% accuracy in authorship attribution
- Outperformed traditional TF-IDF approach
- Network structure and word roles are both crucial for text analysis

## Abstract

Well-established automatic analyses of texts mainly consider frequencies of linguistic units, e.g. letters, words and bigrams, while methods based on co-occurrence networks consider the structure of texts regardless of the nodes label (i.e. the words semantics). In this paper, we reconcile these distinct viewpoints by introducing a generalized similarity measure to compare texts which accounts for both the network structure of texts and the role of individual words in the networks. We use the similarity measure for authorship attribution of three collections of books, each composed of 8 authors and 10 books per author. High accuracy rates were obtained with typical values from 90% to 98.75%, much higher than with the traditional the TF-IDF approach for the same collections. These accuracies are also higher than taking only the topology of networks into account. We conclude that the different properties of specific words on the macroscopic scale structure of a whole text are as relevant as their frequency of appearance; conversely, considering the identity of nodes brings further knowledge about a piece of text represented as a network.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1705.04187/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/1705.04187/full.md

## References

36 references — full list in the complete paper: https://tomesphere.com/paper/1705.04187/full.md

---
Source: https://tomesphere.com/paper/1705.04187