Transcribing Spanish Texts from the Past: Experiments with Transkribus, Tesseract and Granite

Yanco Amor Torterolo-Orta; Jaione Macicior-Mitxelena; Marina Miguez-Lamanuzzi; Ana Garc\'ia-Serrano

arXiv:2507.04878·cs.CV·July 8, 2025

Transcribing Spanish Texts from the Past: Experiments with Transkribus, Tesseract and Granite

Yanco Amor Torterolo-Orta, Jaione Macicior-Mitxelena, Marina Miguez-Lamanuzzi, Ana Garc\'ia-Serrano

PDF

TL;DR

This study compares different OCR approaches for transcribing historical Spanish texts, demonstrating the feasibility of using consumer hardware and highlighting areas for future improvement in OCR accuracy.

Contribution

The paper presents a comparative analysis of web-based, traditional, and multimodal OCR methods applied to Spanish historical texts using accessible hardware.

Findings

01

All OCR methods produced satisfactory results

02

Consumer-grade hardware is sufficient for OCR experiments

03

Further improvements are needed for higher accuracy

Abstract

This article presents the experiments and results obtained by the GRESEL team in the IberLEF 2025 shared task PastReader: Transcribing Texts from the Past. Three types of experiments were conducted with the dual aim of participating in the task and enabling comparisons across different approaches. These included the use of a web-based OCR service, a traditional OCR engine, and a compact multimodal model. All experiments were run on consumer-grade hardware, which, despite lacking high-performance computing capacity, provided sufficient storage and stability. The results, while satisfactory, leave room for further improvement. Future work will focus on exploring new techniques and ideas using the Spanish-language dataset provided by the shared task, in collaboration with Biblioteca Nacional de Espa\~na (BNE).

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.