DocParseNet: Advanced Semantic Segmentation and OCR Embeddings for Efficient Scanned Document Annotation
Ahmad Mohammadshirazi, Ali Nosrati Firoozsalari, Mengxi Zhou, Dheeraj, Kulshrestha, Rajiv Ramnath

TL;DR
DocParseNet is a novel deep learning model that combines semantic segmentation and OCR embeddings to efficiently and accurately annotate complex scanned documents, outperforming existing methods in accuracy and computational efficiency.
Contribution
It introduces a multi-modal deep learning approach that captures text-image interplay, achieving high accuracy with significantly fewer parameters and faster training.
Findings
Achieves 49.78 mIoU on test set, 58% better than baseline.
Reduces model size by 25 times, speeds up training by 5 times.
Maintains high performance with only 2.8 million parameters.
Abstract
Automating the annotation of scanned documents is challenging, requiring a balance between computational efficiency and accuracy. DocParseNet addresses this by combining deep learning and multi-modal learning to process both text and visual data. This model goes beyond traditional OCR and semantic segmentation, capturing the interplay between text and images to preserve contextual nuances in complex document structures. Our evaluations show that DocParseNet significantly outperforms conventional models, achieving mIoU scores of 49.12 on validation and 49.78 on the test set. This reflects a 58% accuracy improvement over state-of-the-art baseline models and an 18% gain compared to the UNext baseline. Remarkably, DocParseNet achieves these results with only 2.8 million parameters, reducing the model size by approximately 25 times and speeding up training by 5 times compared to other…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Semantic Web and Ontologies
