TL;DR
This paper introduces new datasets and models for authorship verification on short Italian texts, analyzing how genre, topic, gender, and length influence attribution accuracy, showing AV's feasibility with limited data.
Contribution
It presents novel Italian datasets for authorship verification and analyzes the impact of various factors on attribution performance.
Findings
AV is feasible with limited data
Gender and topic influence attribution results
Controlling for gender and topic is important
Abstract
Existing research on Authorship Attribution (AA) focuses on texts for which a lot of data is available (e.g novels), mainly in English. We approach AA via Authorship Verification on short Italian texts in two novel datasets, and analyze the interaction between genre, topic, gender and length. Results show that AV is feasible even with little data, but more evidence helps. Gender and topic can be indicative clues, and if not controlled for, they might overtake more specific aspects of personal style.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
