The Topic Confusion Task: A Novel Scenario for Authorship Attribution

Malik H. Altakrori; Jackie Chi Kit Cheung; Benjamin C. M. Fung

arXiv:2104.08530·cs.CL·September 10, 2021

The Topic Confusion Task: A Novel Scenario for Authorship Attribution

Malik H. Altakrori, Jackie Chi Kit Cheung, Benjamin C. M. Fung

PDF

Open Access

TL;DR

The paper introduces the topic confusion task for authorship attribution, which distinguishes errors caused by topic shifts from those caused by stylistic features, revealing the robustness of stylometric features and limitations of pretrained language models.

Contribution

It proposes a novel topic confusion scenario for authorship attribution to analyze error sources and evaluates stylometric features versus pretrained models under topic shifts.

Findings

01

Stylometric features with POS tags are less affected by topic changes.

02

Combining features reduces topic confusion and improves accuracy.

03

Pretrained models like BERT and RoBERTa perform poorly compared to simple n-gram features.

Abstract

Authorship attribution is the problem of identifying the most plausible author of an anonymous text from a set of candidate authors. Researchers have investigated same-topic and cross-topic scenarios of authorship attribution, which differ according to whether new, unseen topics are used in the testing phase. However, neither scenario allows us to explain whether errors are caused by a failure to capture authorship writing style or by a topic shift. Motivated by this, we propose the \emph{topic confusion} task where we switch the author-topic configuration between the training and testing sets. This setup allows us to distinguish two types of errors: those caused by the topic shift and those caused by the features' inability to capture the writing styles. We show that stylometric features with part-of-speech tags are the least susceptible to topic variations. We further show that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAuthorship Attribution and Profiling · Topic Modeling · Natural Language Processing Techniques

MethodsLinear Layer · Refunds@Expedia|||How do I get a full refund from Expedia? · Dropout · Adam · Dense Connections · Attention Is All You Need · Softmax · Linear Warmup With Linear Decay · RoBERTa · WordPiece