Authorship attribution via network motifs identification

Vanessa Queiroz Marinho; Graeme Hirst; Diego Raphael Amancio

arXiv:1607.06961·cs.CL·February 7, 2017

Authorship attribution via network motifs identification

Vanessa Queiroz Marinho, Graeme Hirst, Diego Raphael Amancio

PDF

TL;DR

This paper explores the use of network motifs in co-occurrence networks to improve authorship attribution, demonstrating that motif frequencies can distinguish different authors' writing styles with notable accuracy.

Contribution

It introduces the application of directed 3-node motifs as features for authorship attribution, showing their effectiveness over traditional methods.

Findings

01

Motifs can distinguish authors' writing styles.

02

Best classification accuracy achieved was 57.5%.

03

Function words are significant in motif patterns.

Abstract

Concepts and methods of complex networks can be used to analyse texts at their different complexity levels. Examples of natural language processing (NLP) tasks studied via topological analysis of networks are keyword identification, automatic extractive summarization and authorship attribution. Even though a myriad of network measurements have been applied to study the authorship attribution problem, the use of motifs for text analysis has been restricted to a few works. The goal of this paper is to apply the concept of motifs, recurrent interconnection patterns, in the authorship attribution task. The absolute frequencies of all thirteen directed motifs with three nodes were extracted from the co-occurrence networks and used as classification features. The effectiveness of these features was verified with four machine learning methods. The results show that motifs are able to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.