Improving Automatic Text Recognition with Language Models in the PyLaia   Open-Source Library

Sol\`ene Tarride; Yoann Schneider; Marie Generali-Lince and; M\'elodie Boillet; Bastien Abadie; Christopher Kermorvant

arXiv:2404.18722·cs.CV·April 30, 2024·1 cites

Improving Automatic Text Recognition with Language Models in the PyLaia Open-Source Library

Sol\`ene Tarride, Yoann Schneider, Marie Generali-Lince and, M\'elodie Boillet, Bastien Abadie, Christopher Kermorvant

PDF

Open Access

TL;DR

This paper enhances the PyLaia OCR library by integrating auto-tuned language models and confidence scoring, significantly improving recognition accuracy across multiple datasets with minimal expert intervention.

Contribution

It introduces an easy-to-use, auto-tuned language modeling integration in PyLaia, boosting OCR performance without requiring additional data or expert knowledge.

Findings

01

Word Error Rate reduced by 13% on average

02

Character Error Rate reduced by 12% on average

03

Confidence score calibration improves reliability

Abstract

PyLaia is one of the most popular open-source software for Automatic Text Recognition (ATR), delivering strong performance in terms of speed and accuracy. In this paper, we outline our recent contributions to the PyLaia library, focusing on the incorporation of reliable confidence scores and the integration of statistical language modeling during decoding. Our implementation provides an easy way to combine PyLaia with n-grams language models at different levels. One of the highlights of this work is that language models are completely auto-tuned: they can be built and used easily without any expert knowledge, and without requiring any additional data. To demonstrate the significance of our contribution, we evaluate PyLaia's performance on twelve datasets, both with and without language modelling. The results show that decoding with small language models improves the Word Error Rate by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques · Natural Language Processing Techniques

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings