Loci Similes: A Benchmark for Extracting Intertextualities in Latin Literature
Julian Schelb, Michael Wittweiler, Marie Revellio, Barbara Feichtinger, Andreas Spitz

TL;DR
This paper introduces Loci Similes, a comprehensive benchmark dataset designed to facilitate the detection of intertextualities in Latin literature, enabling improved research through standardized evaluation of language models.
Contribution
The paper presents Loci Similes, the first large-scale benchmark dataset for Latin intertextuality detection, along with baseline results using advanced language models.
Findings
Established baseline performance for intertextuality detection
Curated dataset of 172,000 text segments with expert-verified parallels
Demonstrated effectiveness of language models in Latin intertextuality tasks
Abstract
Tracing connections between historical texts is an important part of intertextual research, enabling scholars to reconstruct the virtual library of a writer and identify the sources influencing their creative process. These intertextual links manifest in diverse forms, ranging from direct verbatim quotations to subtle allusions and paraphrases disguised by morphological variation. Language models offer a promising path forward due to their capability of capturing semantic similarity beyond lexical overlap. However, the development of new methods for this task is held back by the scarcity of standardized benchmarks and easy-to-use datasets. We address this gap by introducing Loci Similes, a benchmark for Latin intertextuality detection comprising of a curated dataset of ~172k text segments containing 545 expert-verified parallels linking Late Antique authors to a corpus of classical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗julian-schelb/SPhilBerta-emb-lat-intertext-v1model· 82 dl· ♡ 182 dl♡ 1
- 🤗julian-schelb/multilingual-e5-small-emb-lat-intertext-v1model· 68 dl68 dl
- 🤗julian-schelb/multilingual-e5-base-emb-lat-intertext-v1model· 67 dl67 dl
- 🤗julian-schelb/multilingual-e5-large-emb-lat-intertext-v1model· 136 dl136 dl
- 🤗julian-schelb/granite-embedding-107m-emb-lat-intertext-v1model· 65 dl65 dl
- 🤗julian-schelb/granite-embedding-278m-emb-lat-intertext-v1model· 65 dl65 dl
- 🤗julian-schelb/bge-m3-emb-lat-intertext-v1model· 56 dl56 dl
- 🤗julian-schelb/mmbert-small-class-lat-intertext-v1model· 86 dl86 dl
- 🤗julian-schelb/mmbert-base-class-lat-intertext-v1model· 94 dl94 dl
- 🤗julian-schelb/bert-romanian-class-lat-intertext-v1model· 99 dl99 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAuthorship Attribution and Profiling · Digital Humanities and Scholarship · Topic Modeling
