Deep Bayes Factor Scoring for Authorship Verification

Benedikt Boenninghoff; Julian Rupp; Robert M. Nickel and; Dorothea Kolossa

arXiv:2008.10105·cs.CL·August 25, 2020·5 cites

Deep Bayes Factor Scoring for Authorship Verification

Benedikt Boenninghoff, Julian Rupp, Robert M. Nickel and, Dorothea Kolossa

PDF

Open Access

TL;DR

This paper introduces a hierarchical deep learning approach combining metric learning and Bayesian scoring for authorship verification in challenging cross-topic fanfiction datasets, improving verification accuracy.

Contribution

It presents a novel end-to-end framework that fuses deep metric learning with Bayesian scoring for authorship verification, addressing cross-topic challenges.

Findings

01

Effective in cross-topic authorship verification tasks

02

Hierarchical fusion improves verification accuracy

03

Provides robust text preprocessing strategies

Abstract

The PAN 2020 authorship verification (AV) challenge focuses on a cross-topic/closed-set AV task over a collection of fanfiction texts. Fanfiction is a fan-written extension of a storyline in which a so-called fandom topic describes the principal subject of the document. The data provided in the PAN 2020 AV task is quite challenging because authors of texts across multiple/different fandom topics are included. In this work, we present a hierarchical fusion of two well-known approaches into a single end-to-end learning procedure: A deep metric learning framework at the bottom aims to learn a pseudo-metric that maps a document of variable length onto a fixed-sized feature vector. At the top, we incorporate a probabilistic layer to perform Bayes factor scoring in the learned metric space. We also provide text preprocessing strategies to deal with the cross-topic issue.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAuthorship Attribution and Profiling · Topic Modeling · Natural Language Processing Techniques