Structure-Aware Text Recognition for Ancient Greek Critical Editions
Nicolas Angleraud, Antonia Karamolegkou, Beno\^it Sagot, Thibault Cl\'erice

TL;DR
This paper evaluates the ability of visual language models to interpret complex layout semantics in ancient Greek scholarly texts, introducing new datasets and demonstrating current limitations and potential improvements.
Contribution
It introduces large-scale synthetic and real datasets for structure-aware text recognition in historical documents and evaluates state-of-the-art models, highlighting their limitations and potential.
Findings
Qwen3VL-8B achieves 1.0% median CER on real scans.
Current VLMs underperform compared to traditional software in structured documents.
Significant room for improvement in VLMs for complex scholarly texts.
Abstract
Recent advances in visual language models (VLMs) have transformed end-to-end document understanding. However, their ability to interpret the complex layout semantics of historical scholarly texts remains limited. This paper investigates structure-aware text recognition for Ancient Greek critical editions, which have dense reference hierarchies and extensive marginal annotations. We introduce two novel resources: (i) a large-scale synthetic corpus of 185,000 page images generated from TEI/XML sources with controlled typographic and layout variation, and (ii) a curated benchmark of real scanned editions spanning more than a century of editorial and typographic practices. Using these datasets, we evaluate three state-of-the-art VLMs under both zero-shot and fine-tuning regimes. Our experiments reveal substantial limitations in current VLM architectures when confronted with highly…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Digital Humanities and Scholarship · Multimodal Machine Learning Applications
