Document Image Coding and Clustering for Script Discrimination
Darko Brodic, Alessia Amelio, Zoran N. Milivojevic, Milena Jevtic

TL;DR
This paper presents a novel script discrimination method for documents by converting text into coded images and applying texture analysis and clustering, effectively distinguishing various historical scripts.
Contribution
It introduces a new coding and texture analysis approach combined with clustering for script discrimination, outperforming existing methods on historical document datasets.
Findings
Superior accuracy on historical script datasets
Effective differentiation of Cyrillic, Glagolitic, Antiqua, and Fraktur scripts
Outperforms state-of-the-art methods in script classification
Abstract
The paper introduces a new method for discrimination of documents given in different scripts. The document is mapped into a uniformly coded text of numerical values. It is derived from the position of the letters in the text line, based on their typographical characteristics. Each code is considered as a gray level. Accordingly, the coded text determines a 1-D image, on which texture analysis by run-length statistics and local binary pattern is performed. It defines feature vectors representing the script content of the document. A modified clustering approach employed on document feature vector groups documents written in the same script. Experimentation performed on two custom oriented databases of historical documents in old Cyrillic, angular and round Glagolitic as well as Antiqua and Fraktur scripts demonstrates the superiority of the proposed method with respect to well-known…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Image Retrieval and Classification Techniques · Advanced Image and Video Retrieval Techniques
