OCR for TIFF Compressed Document Images Directly in Compressed Domain Using Text segmentation and Hidden Markov Model
Dikshit Sharma, Mohammed Javed

TL;DR
This paper presents a novel OCR method that directly processes CCITT compressed TIFF document images in their compressed form, utilizing text segmentation and Hidden Markov Models to recognize text without decompression.
Contribution
It introduces a new approach for OCR on compressed images using text segmentation and HMM, avoiding decompression for efficiency.
Findings
OCR on pass modes yields promising results
Direct compressed domain processing reduces computational effort
Method effectively recognizes text in compressed TIFF images
Abstract
In today's technological era, document images play an important and integral part in our day to day life, and specifically with the surge of Covid-19, digitally scanned documents have become key source of communication, thus avoiding any sort of infection through physical contact. Storage and transmission of scanned document images is a very memory intensive task, hence compression techniques are being used to reduce the image size before archival and transmission. To extract information or to operate on the compressed images, we have two ways of doing it. The first way is to decompress the image and operate on it and subsequently compress it again for the efficiency of storage and transmission. The other way is to use the characteristics of the underlying compression algorithm to directly process the images in their compressed form without involving decompression and re-compression. In…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Image Retrieval and Classification Techniques · Image Processing and 3D Reconstruction
