OCR Post-Processing Error Correction Algorithm using Google Online Spelling Suggestion
Youssef Bassil, Mohammad Alwani

TL;DR
This paper presents a post-processing error correction algorithm for OCR outputs that leverages Google's online spelling suggestions to significantly improve correction accuracy of misspelled words.
Contribution
It introduces a context-based correction method using Google's database, enhancing OCR error correction beyond traditional approaches.
Findings
Significant improvement in OCR error correction rate
Effective detection and correction of non-word and real-word errors
Potential for parallelization and faster processing
Abstract
With the advent of digital optical scanners, a lot of paper-based books, textbooks, magazines, articles, and documents are being transformed into an electronic version that can be manipulated by a computer. For this purpose, OCR, short for Optical Character Recognition was developed to translate scanned graphical text into editable computer text. Unfortunately, OCR is still imperfect as it occasionally mis-recognizes letters and falsely identifies scanned text, leading to misspellings and linguistics errors in the OCR output text. This paper proposes a post-processing context-based error correction algorithm for detecting and correcting OCR non-word and real-word errors. The proposed algorithm is based on Google's online spelling suggestion which harnesses an internal database containing a huge collection of terms and word sequences gathered from all over the web, convenient to suggest…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Image Retrieval and Classification Techniques · Vehicle License Plate Recognition
