# Labelled network subgraphs reveal stylistic subtleties in written texts

**Authors:** Vanessa Q. Marinho, Graeme Hirst, Diego R. Amancio

arXiv: 1705.00545 · 2017-11-09

## TL;DR

This paper introduces a hybrid network-based classifier called labelled subgraphs that combines word frequency and network motifs to analyze texts for authorship and translation origin, showing promising results.

## Contribution

It presents a novel hybrid classifier integrating network motifs with word frequency for text analysis tasks, bridging complex network models and traditional statistical methods.

## Key findings

- Labelling subgraphs effectively distinguish authorship and translationese.
- Network motifs capture stylistic subtleties in texts.
- Method shows potential for broader linguistic analysis.

## Abstract

The vast amount of data and increase of computational capacity have allowed the analysis of texts from several perspectives, including the representation of texts as complex networks. Nodes of the network represent the words, and edges represent some relationship, usually word co-occurrence. Even though networked representations have been applied to study some tasks, such approaches are not usually combined with traditional models relying upon statistical paradigms. Because networked models are able to grasp textual patterns, we devised a hybrid classifier, called labelled subgraphs, that combines the frequency of common words with small structures found in the topology of the network, known as motifs. Our approach is illustrated in two contexts, authorship attribution and translationese identification. In the former, a set of novels written by different authors is analyzed. To identify translationese, texts from the Canadian Hansard and the European parliament were classified as to original and translated instances. Our results suggest that labelled subgraphs are able to represent texts and it should be further explored in other tasks, such as the analysis of text complexity, language proficiency, and machine translation.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1705.00545/full.md

## Figures

7 figures with captions in the complete paper: https://tomesphere.com/paper/1705.00545/full.md

## References

54 references — full list in the complete paper: https://tomesphere.com/paper/1705.00545/full.md

---
Source: https://tomesphere.com/paper/1705.00545