Loci Similes: A Benchmark for Extracting Intertextualities in Latin Literature

Julian Schelb; Michael Wittweiler; Marie Revellio; Barbara Feichtinger; Andreas Spitz

arXiv:2601.07533·cs.IR·January 30, 2026

Loci Similes: A Benchmark for Extracting Intertextualities in Latin Literature

Julian Schelb, Michael Wittweiler, Marie Revellio, Barbara Feichtinger, Andreas Spitz

PDF

Open Access 10 Models

TL;DR

This paper introduces Loci Similes, a comprehensive benchmark dataset designed to facilitate the detection of intertextualities in Latin literature, enabling improved research through standardized evaluation of language models.

Contribution

The paper presents Loci Similes, the first large-scale benchmark dataset for Latin intertextuality detection, along with baseline results using advanced language models.

Findings

01

Established baseline performance for intertextuality detection

02

Curated dataset of 172,000 text segments with expert-verified parallels

03

Demonstrated effectiveness of language models in Latin intertextuality tasks

Abstract

Tracing connections between historical texts is an important part of intertextual research, enabling scholars to reconstruct the virtual library of a writer and identify the sources influencing their creative process. These intertextual links manifest in diverse forms, ranging from direct verbatim quotations to subtle allusions and paraphrases disguised by morphological variation. Language models offer a promising path forward due to their capability of capturing semantic similarity beyond lexical overlap. However, the development of new methods for this task is held back by the scarcity of standardized benchmarks and easy-to-use datasets. We address this gap by introducing Loci Similes, a benchmark for Latin intertextuality detection comprising of a curated dataset of ~172k text segments containing 545 expert-verified parallels linking Late Antique authors to a corpus of classical…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAuthorship Attribution and Profiling · Digital Humanities and Scholarship · Topic Modeling