Handwritten and Printed Text Separation in Real Document
Abdel Bela\"id (LORIA), K.C. Santosh (LORIA), Vincent Poulain D'Andecy

TL;DR
This paper presents a method to effectively separate handwritten and printed text in real noisy documents using RLSA, SVM classification, and linear complexity algorithms, achieving near 90% accuracy with limited training data.
Contribution
The paper introduces a novel combination of RLSA, SVM, and k-NN with kd-tree for efficient text separation in complex documents, emphasizing linear time complexity.
Findings
Achieves nearly 90% accuracy in text separation.
Uses linear complexity methods suitable for large datasets.
Effective even with small training datasets.
Abstract
The aim of the paper is to separate handwritten and printed text from a real document embedded with noise, graphics including annotations. Relying on run-length smoothing algorithm (RLSA), the extracted pseudo-lines and pseudo-words are used as basic blocks for classification. To handle this, a multi-class support vector machine (SVM) with Gaussian kernel performs a first labelling of each pseudo-word including the study of local neighbourhood. It then propagates the context between neighbours so that we can correct possible labelling errors. Considering running time complexity issue, we propose linear complexity methods where we use k-NN with constraint. When using a kd-tree, it is almost linearly proportional to the number of pseudo-words. The performance of our system is close to 90%, even when very small learning dataset where samples are basically composed of complex administrative…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Image Processing and 3D Reconstruction · Image Retrieval and Classification Techniques
