Multimodal LLMs for OCR, OCR Post-Correction, and Named Entity Recognition in Historical Documents
Gavin Greif, Niclas Griesshaber, Robin Greif

TL;DR
This paper demonstrates that multimodal Large Language Models significantly improve OCR accuracy, enable effective OCR post-correction, and facilitate named entity recognition in historical documents, offering a new paradigm for digital humanities research.
Contribution
The paper introduces multimodal LLMs for OCR, post-correction, and NER in historical documents, outperforming traditional models and pioneering multimodal post-correction techniques.
Findings
mLLMs outperform conventional OCR models
Multimodal post-correction drastically reduces transcription errors (<1% CER)
mLLMs efficiently extract structured data from historical texts
Abstract
We explore how multimodal Large Language Models (mLLMs) can help researchers transcribe historical documents, extract relevant historical information, and construct datasets from historical sources. Specifically, we investigate the capabilities of mLLMs in performing (1) Optical Character Recognition (OCR), (2) OCR Post-Correction, and (3) Named Entity Recognition (NER) tasks on a set of city directories published in German between 1754 and 1870. First, we benchmark the off-the-shelf transcription accuracy of both mLLMs and conventional OCR models. We find that the best-performing mLLM model significantly outperforms conventional state-of-the-art OCR models and other frontier mLLMs. Second, we are the first to introduce multimodal post-correction of OCR output using mLLMs. We find that this novel approach leads to a drastic improvement in transcription accuracy and consistently produces…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Topic Modeling · Multimodal Machine Learning Applications
MethodsSparse Evolutionary Training
