Open Set Authorship Attribution toward Demystifying Victorian   Periodicals

Sarkhan Badirli; Mary Borgo Ton; Abdulmecit Gungor; Murat Dundar

arXiv:1912.08259·cs.CL·December 19, 2019

Open Set Authorship Attribution toward Demystifying Victorian Periodicals

Sarkhan Badirli, Mary Borgo Ton, Abdulmecit Gungor, Murat Dundar

PDF

1 Repo

TL;DR

This paper explores authorship attribution in Victorian literature, highlighting the challenges of open-set classification with many candidate authors and evaluating the effectiveness of standard machine learning methods.

Contribution

It introduces a new dataset for Victorian texts and analyzes the limitations of existing methods in open-set authorship attribution scenarios.

Findings

01

Linear classifiers perform well in closed-set attribution.

02

Standard methods struggle with large candidate pools in open-set scenarios.

03

Robust approaches are needed for real-world authorship attribution.

Abstract

Existing research in computational authorship attribution (AA) has primarily focused on attribution tasks with a limited number of authors in a closed-set configuration. This restricted set-up is far from being realistic in dealing with highly entangled real-world AA tasks that involve a large number of candidate authors for attribution during test time. In this paper, we study AA in historical texts using anew data set compiled from the Victorian literature. We investigate the predictive capacity of most common English words in distinguishing writings of most prominent Victorian novelists. We challenged the closed-set classification assumption and discussed the limitations of standard machine learning techniques in dealing with the open set AA task. Our experiments suggest that a linear classifier can achieve near perfect attribution accuracy under closed set assumption yet, the need…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sbadirli/Open-Set-Authorship-Attribution
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsTest