MedLatinEpi and MedLatinLit: Two Datasets for the Computational   Authorship Analysis of Medieval Latin Texts

Silvia Corbara; Alejandro Moreo; Fabrizio Sebastiani; Mirko Tavoni

arXiv:2006.12289·cs.CL·September 22, 2021·1 cites

MedLatinEpi and MedLatinLit: Two Datasets for the Computational Authorship Analysis of Medieval Latin Texts

Silvia Corbara, Alejandro Moreo, Fabrizio Sebastiani, Mirko Tavoni

PDF

Open Access

TL;DR

This paper introduces two curated datasets of medieval Latin texts, MedLatinEpi and MedLatinLit, designed for computational authorship analysis, and provides baseline experimental results on authorship verification tasks.

Contribution

The paper presents new datasets for medieval Latin authorship analysis and offers baseline experiments and source code for reproducibility.

Findings

01

Baseline authorship verification results on the datasets.

02

Application of the system to disputed medieval epistles.

03

Provision of datasets and code for future research.

Abstract

We present and make available MedLatinEpi and MedLatinLit, two datasets of medieval Latin texts to be used in research on computational authorship analysis. MedLatinEpi and MedLatinLit consist of 294 and 30 curated texts, respectively, labelled by author; MedLatinEpi texts are of epistolary nature, while MedLatinLit texts consist of literary comments and treatises about various subjects. As such, these two datasets lend themselves to supporting research in authorship analysis tasks, such as authorship attribution, authorship verification, or same-author verification. Along with the datasets we provide experimental results, obtained on these datasets, for the authorship verification task, i.e., the task of predicting whether a text of unknown authorship was written by a candidate author or not. We also make available the source code of the authorship verification system we have used,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAuthorship Attribution and Profiling · Natural Language Processing Techniques · Text Readability and Simplification