Deep Bayes Factor Scoring for Authorship Verification
Benedikt Boenninghoff, Julian Rupp, Robert M. Nickel and, Dorothea Kolossa

TL;DR
This paper introduces a hierarchical deep learning approach combining metric learning and Bayesian scoring for authorship verification in challenging cross-topic fanfiction datasets, improving verification accuracy.
Contribution
It presents a novel end-to-end framework that fuses deep metric learning with Bayesian scoring for authorship verification, addressing cross-topic challenges.
Findings
Effective in cross-topic authorship verification tasks
Hierarchical fusion improves verification accuracy
Provides robust text preprocessing strategies
Abstract
The PAN 2020 authorship verification (AV) challenge focuses on a cross-topic/closed-set AV task over a collection of fanfiction texts. Fanfiction is a fan-written extension of a storyline in which a so-called fandom topic describes the principal subject of the document. The data provided in the PAN 2020 AV task is quite challenging because authors of texts across multiple/different fandom topics are included. In this work, we present a hierarchical fusion of two well-known approaches into a single end-to-end learning procedure: A deep metric learning framework at the bottom aims to learn a pseudo-metric that maps a document of variable length onto a fixed-sized feature vector. At the top, we incorporate a probabilistic layer to perform Bayes factor scoring in the learned metric space. We also provide text preprocessing strategies to deal with the cross-topic issue.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAuthorship Attribution and Profiling · Topic Modeling · Natural Language Processing Techniques
