Topographic Feature Extraction for Bengali and Hindi Character Images
Soumen Bag, Gaurav Harit

TL;DR
This paper introduces novel topographic features based on stroke structure and spatial relations for Bengali and Hindi OCR, improving character discrimination in printed and handwritten images.
Contribution
It proposes a new shape-based graph feature set derived from topography of strokes from multiple views, enhancing OCR accuracy for complex scripts.
Findings
Effective discrimination of similar characters achieved
Features work well on both printed and handwritten images
Initial results show promising OCR performance improvements
Abstract
Feature selection and extraction plays an important role in different classification based problems such as face recognition, signature verification, optical character recognition (OCR) etc. The performance of OCR highly depends on the proper selection and extraction of feature set. In this paper, we present novel features based on the topography of a character as visible from different viewing directions on a 2D plane. By topography of a character we mean the structural features of the strokes and their spatial relations. In this work we develop topographic features of strokes visible with respect to views from different directions (e.g. North, South, East, and West). We consider three types of topographic features: closed region, convexity of strokes, and straight line strokes. These features are represented as a shape-based graph which acts as an invariant feature set for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
