Genetic Programming for Document Segmentation and Region Classification Using Discipulus
N. Priyadharshini, M.S. Vijaya

TL;DR
This paper presents a genetic programming-based method for automatic document segmentation and classification into regions like text, images, and tables, achieving high accuracy and reducing manual effort in data extraction.
Contribution
It introduces a novel approach using Discipulus for genetic programming to classify document regions with 97.5% accuracy, improving automation in document analysis.
Findings
Achieved 97.5% classification accuracy.
Used Run length smearing rule for segmentation.
Demonstrated effectiveness of genetic programming in document classification.
Abstract
Document segmentation is a method of rending the document into distinct regions. A document is an assortment of information and a standard mode of conveying information to others. Pursuance of data from documents involves ton of human effort, time intense and might severely prohibit the usage of data systems. So, automatic information pursuance from the document has become a big issue. It is been shown that document segmentation will facilitate to beat such problems. This paper proposes a new approach to segment and classify the document regions as text, image, drawings and table. Document image is divided into blocks using Run length smearing rule and features are extracted from every blocks. Discipulus tool has been used to construct the Genetic programming based classifier model and located 97.5% classification accuracy.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Smart Agriculture and AI · Image Retrieval and Classification Techniques
