Document Decomposition of Bangla Printed Text

Md. Fahad Hasan; Tasmin Afroz; Sabir Ismail; Md. Saiful Islam

arXiv:1701.08706·cs.CV·January 31, 2017·1 cites

Document Decomposition of Bangla Printed Text

Md. Fahad Hasan, Tasmin Afroz, Sabir Ismail, Md. Saiful Islam

PDF

Open Access

TL;DR

This paper presents a method for decomposing Bangla printed documents into various regions like headlines, images, and columns, including preprocessing steps like deskewing and de-rotation to improve OCR accuracy.

Contribution

It introduces a novel algorithm for segmenting Bangla documents into meaningful parts and handling skewed or rotated images, which was lacking in existing OCR tools.

Findings

01

Successfully decomposed Bangla documents into regions

02

Effective deskewing and de-rotation of skewed images

03

Enhanced accuracy for subsequent OCR processing

Abstract

Today all kind of information is getting digitized and along with all this digitization, the huge archive of various kinds of documents is being digitized too. We know that, Optical Character Recognition is the method through which, newspapers and other paper documents convert into digital resources. But, it is a fact that this method works on texts only. As a result, if we try to process any document which contains non-textual zones, then we will get garbage texts as output. That is why; in order to digitize documents properly they should be prepossessed carefully. And while preprocessing, segmenting document in different regions according to the category properly is most important. But, the Optical Character Recognition processes available for Bangla language have no such algorithm that can categorize a newspaper/book page fully. So we worked to decompose a document into its several…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques · Algorithms and Data Compression · Vehicle License Plate Recognition