A Benchmark of State-Space Models vs. Transformers and BiLSTM-based Models for Historical Newspaper OCR

Merveilles Agbeti-messan; Thierry Paquet; Cl\'ement Chatelain; Pierrick Tranouez; St\'ephane Nicolas

arXiv:2604.00725·cs.CV·April 2, 2026

A Benchmark of State-Space Models vs. Transformers and BiLSTM-based Models for Historical Newspaper OCR

Merveilles Agbeti-messan, Thierry Paquet, Cl\'ement Chatelain, Pierrick Tranouez, St\'ephane Nicolas

PDF

1 Repo

TL;DR

This paper introduces a scalable, linear-time State-Space Model (Mamba) for OCR of historical newspapers, demonstrating competitive accuracy and improved efficiency over Transformer-based models.

Contribution

It presents the first OCR architecture based on SSMs, combining CNN encoders with Mamba sequence modeling, and provides a comprehensive benchmark against existing models.

Findings

01

Mamba models halve inference time compared to Transformer-based models.

02

All neural models achieve around 2% CER on historical newspaper OCR.

03

Mamba maintains competitive accuracy with superior memory efficiency.

Abstract

End-to-end OCR for historical newspapers remains challenging, as models must handle long text sequences, degraded print quality, and complex layouts. While Transformer-based recognizers dominate current research, their quadratic complexity limits efficient paragraph-level transcription and large-scale deployment. We investigate linear-time State-Space Models (SSMs), specifically Mamba, as a scalable alternative to Transformer-based sequence modeling for OCR. We present to our knowledge, the first OCR architecture based on SSMs, combining a CNN visual encoder with bi-directional and autoregressive Mamba sequence modeling, and conduct a large-scale benchmark comparing SSMs with Transformer- and BiLSTM-based recognizers. Multiple decoding strategies (CTC, autoregressive, and non-autoregressive) are evaluated under identical training conditions alongside strong neural baselines (VAN, DAN,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

null
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.