Same or Different? Diff-Vectors for Authorship Analysis

Silvia Corbara; Alejandro Moreo; Fabrizio Sebastiani

arXiv:2301.09862·cs.LG·January 25, 2023

Same or Different? Diff-Vectors for Authorship Analysis

Silvia Corbara, Alejandro Moreo, Fabrizio Sebastiani

PDF

Open Access 1 Repo

TL;DR

This paper introduces Diff-Vectors, a novel representation for authorship analysis that compares document pairs directly, leading to improved identification performance especially with limited training data.

Contribution

It systematically studies Diff-Vectors for authorship tasks, demonstrating their advantages over traditional feature vectors and proposing new methods for verification and attribution.

Findings

01

Diff-Vectors improve authorship identification accuracy.

02

Diff-Vectors are especially effective with scarce training data.

03

New methods for authorship verification and attribution using Diff-Vectors.

Abstract

We investigate the effects on authorship identification tasks of a fundamental shift in how to conceive the vectorial representations of documents that are given as input to a supervised learner. In ``classic'' authorship analysis a feature vector represents a document, the value of a feature represents (an increasing function of) the relative frequency of the feature in the document, and the class label represents the author of the document. We instead investigate the situation in which a feature vector represents an unordered pair of documents, the value of a feature represents the absolute difference in the relative frequencies (or increasing functions thereof) of the feature in the two documents, and the class label indicates whether the two documents are from the same author or not. This latter (learner-independent) type of representation has been occasionally used before, but has…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

alexmoreo/diff-vectors
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAuthorship Attribution and Profiling · Hate Speech and Cyberbullying Detection