The Importance of Suppressing Domain Style in Authorship Analysis
Sebastian Bischoff, Niklas Deckers, Marcel Schliebs, Ben Thies,, Matthias Hagen, Efstathios Stamatatos, Benno Stein, Martin Potthast

TL;DR
This paper investigates how domain-specific styles influence authorship analysis and demonstrates that domain-adversarial learning significantly improves robustness against domain shifts, outperforming heuristic methods.
Contribution
It introduces a novel experimental setup for assessing domain influence in authorship analysis and proposes effective domain-adversarial learning techniques to mitigate domain effects.
Findings
Character trigram features are highly affected by domain changes.
Domain-adversarial learning reduces accuracy loss to under 4%.
Heuristic domain-removal methods are less effective than learned approaches.
Abstract
The prerequisite of many approaches to authorship analysis is a representation of writing style. But despite decades of research, it still remains unclear to what extent commonly used and widely accepted representations like character trigram frequencies actually represent an author's writing style, in contrast to more domain-specific style components or even topic. We address this shortcoming for the first time in a novel experimental setup of fixed authors but swapped domains between training and testing. With this setup, we reveal that approaches using character trigram features are highly susceptible to favor domain information when applied without attention to domains, suffering drops of up to 55.4 percentage points in classification accuracy under domain swapping. We further propose a new remedy based on domain-adversarial learning and compare it to ones from the literature based…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAuthorship Attribution and Profiling · Names, Identity, and Discrimination Research · Hate Speech and Cyberbullying Detection
