Text Line Identification in Tagore's Manuscript

Chandranath Adak; Bidyut B. Chaudhuri

arXiv:1408.6911·cs.CV·August 19, 2016

Text Line Identification in Tagore's Manuscript

Chandranath Adak, Bidyut B. Chaudhuri

PDF

TL;DR

This paper presents a novel method for identifying text lines in handwritten manuscripts with doodles, specifically applied to Tagore's complex manuscripts, using a combination of image processing techniques.

Contribution

The paper introduces a new approach combining window examination, black run-length smearing, histogram analysis, and connected components for line detection in difficult handwritten manuscripts.

Findings

01

Effective separation of doodles from text regions.

02

Successful identification of text lines in complex handwritten manuscripts.

03

Applicable to manuscripts with non-uniform line structures.

Abstract

In this paper, a text line identification method is proposed. The text lines of printed document are easy to segment due to uniform straightness of the lines and sufficient gap between the lines. But in handwritten documents, the line is non-uniform and interline gaps are variable. We take Rabindranath Tagore's manuscript as it is one of the most difficult manuscripts that contain doodles. Our method consists of a pre-processing stage to clean the document image. Then we separate doodles from the manuscript to get the textual region. After that we identify the text lines on the manuscript. For text line identification, we use window examination, black run-length smearing, horizontal histogram and connected component analysis.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.