Spotting Separator Points at Line Terminals in Compressed Document Images for Text-line Segmentation
Amarnath R, P. Nagabhushan

TL;DR
This paper proposes a method to identify text-line separator points directly in compressed document images using run length encoding, enabling efficient text-line segmentation without decompression, especially in handwritten documents.
Contribution
It introduces a novel approach to detect separator points in RLE compressed images for text-line segmentation, reducing computational load and handling over/under separation issues.
Findings
Effective in compressed domain for printed and handwritten text
Reduces need for full image decompression
Validated on ICDAR13 and Alireza datasets
Abstract
Line separators are used to segregate text-lines from one another in document image analysis. Finding the separator points at every line terminal in a document image would enable text-line segmentation. In particular, identifying the separators in handwritten text could be a thrilling exercise. Obviously it would be challenging to perform this in the compressed version of a document image and that is the proposed objective in this research. Such an effort would prevent the computational burden of decompressing a document for text-line segmentation. Since document images are generally compressed using run length encoding (RLE) technique as per the CCITT standards, the first column in the RLE will be a white column. The value (depth) in the white column is very low when a particular line is a text line and the depth could be larger at the point of text line separation. A longer…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
