Automatic Page Segmentation Without Decompressing the Run-Length Compressed Text Documents
Mohammed Javed, P. Nagabhushan

TL;DR
This paper presents a novel method for performing page segmentation directly on run-length compressed CCITT Group-3 text documents, eliminating the need for decompression and improving efficiency in document analysis.
Contribution
It introduces a new approach for direct page segmentation in compressed documents, including strategies for handling inverted text regions and estimating parameters automatically.
Findings
Effective segmentation of multi-column and inverted text regions.
Successful direct segmentation without decompression.
Improved efficiency in document layout analysis.
Abstract
Page segmentation is considered to be the crucial stage for the automatic analysis of documents with complex layouts. This has traditionally been carried out in uncompressed documents, although most of the documents in real life exist in a compressed form warranted by the requirement to make storage and transfer efficient. However, carrying out page segmentation directly in compressed documents without going through the stage of decompression is a challenging goal. This research paper proposes demonstrating the possibility of carrying out a page segmentation operation directly in the run-length data of the CCITT Group-3 compressed text document, which could be single- or multi-columned and might even have some text regions in the inverted text color mode. Therefore, before carrying out the segmentation of the text document into columns, each column into paragraphs, each paragraph into…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Vehicle License Plate Recognition · Image Retrieval and Classification Techniques
