Detecting Latin in Historical Books with Large Language Models: A Multimodal Benchmark

Yu Wu; Ke Shu; Jonas Fischer; Lidia Pivovarova; David Rosson; Eetu M\"akel\"a; Mikko Tolonen

arXiv:2510.19585·cs.CL·February 9, 2026

Detecting Latin in Historical Books with Large Language Models: A Multimodal Benchmark

Yu Wu, Ke Shu, Jonas Fischer, Lidia Pivovarova, David Rosson, Eetu M\"akel\"a, Mikko Tolonen

PDF

1 Video

TL;DR

This paper introduces a new task of detecting Latin fragments in historical documents using large language models, providing a benchmark dataset and evaluating model performance in a challenging, multimodal setting.

Contribution

It presents a novel multimodal benchmark dataset and evaluates large foundation models for Latin detection in noisy, mixed-language historical texts, establishing a baseline for future research.

Findings

01

Zero-shot models can reliably detect Latin fragments.

02

Current models lack deep understanding of Latin language.

03

Benchmark dataset and code are publicly available.

Abstract

This paper presents a novel task of extracting low-resourced and noisy Latin fragments from mixed-language historical documents with varied layouts. We benchmark and evaluate the performance of large foundation models against a multimodal dataset of 724 annotated pages. The results demonstrate that reliable Latin detection with contemporary zero-shot models is achievable, yet these models lack a functional comprehension of Latin. This study establishes a comprehensive baseline for processing Latin within mixed-language corpora, supporting quantitative analysis in intellectual history and historical linguistics. Both the dataset and code are available at https://github.com/COMHIS/EACL26-detect-latin.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Detecting Latin in Historical Books with Large Language Models: A Multimodal Benchmark· underline