Stylometry for Noisy Medieval Data: Evaluating Paul Meyer's Hagiographic Hypothesis
Jean-Baptiste Camps, Thibault Cl\'erice, Ariane Pinche

TL;DR
This paper explores the use of combined handwritten text recognition and stylometric analysis to evaluate Paul Meyer's hypothesis about authorial groupings in medieval hagiographic texts, addressing challenges posed by scribal variation and textual errors.
Contribution
It introduces a workflow integrating handwritten text recognition with stylometry to analyze noisy medieval texts and tests Meyer's hypothesis on hagiographic groupings.
Findings
Supported the grouping hypothesis with stylometric evidence.
Identified potential authorial clusters in anonymous texts.
Demonstrated effectiveness of combined OCR and stylometry for medieval data.
Abstract
Stylometric analysis of medieval vernacular texts is still a significant challenge: the importance of scribal variation, be it spelling or more substantial, as well as the variants and errors introduced in the tradition, complicate the task of the would-be stylometrist. Basing the analysis on the study of the copy from a single hand of several texts can partially mitigate these issues (Camps and Cafiero, 2013), but the limited availability of complete diplomatic transcriptions might make this difficult. In this paper, we use a workflow combining handwritten text recognition and stylometric analysis, applied to the case of the hagiographic works contained in MS BnF, fr. 412. We seek to evaluate Paul Meyer's hypothesis about the constitution of groups of hagiographic works, as well as to examine potential authorial groupings in a vastly anonymous corpus.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAuthorship Attribution and Profiling · Topic Modeling · Text Readability and Simplification
