Improving Automatic Text Recognition with Language Models in the PyLaia Open-Source Library
Sol\`ene Tarride, Yoann Schneider, Marie Generali-Lince and, M\'elodie Boillet, Bastien Abadie, Christopher Kermorvant

TL;DR
This paper enhances the PyLaia OCR library by integrating auto-tuned language models and confidence scoring, significantly improving recognition accuracy across multiple datasets with minimal expert intervention.
Contribution
It introduces an easy-to-use, auto-tuned language modeling integration in PyLaia, boosting OCR performance without requiring additional data or expert knowledge.
Findings
Word Error Rate reduced by 13% on average
Character Error Rate reduced by 12% on average
Confidence score calibration improves reliability
Abstract
PyLaia is one of the most popular open-source software for Automatic Text Recognition (ATR), delivering strong performance in terms of speed and accuracy. In this paper, we outline our recent contributions to the PyLaia library, focusing on the incorporation of reliable confidence scores and the integration of statistical language modeling during decoding. Our implementation provides an easy way to combine PyLaia with n-grams language models at different levels. One of the highlights of this work is that language models are completely auto-tuned: they can be built and used easily without any expert knowledge, and without requiring any additional data. To demonstrate the significance of our contribution, we evaluate PyLaia's performance on twelve datasets, both with and without language modelling. The results show that decoding with small language models improves the Word Error Rate by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Natural Language Processing Techniques
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
