Document Structure Extraction using Prior based High Resolution Hierarchical Semantic Segmentation
Mausoom Sarkar, Milan Aggarwal, Arneh Jain, Hiresh Gupta, Balaji, Krishnamurthy

TL;DR
This paper introduces a hierarchical semantic segmentation approach using high-resolution images and prior information to accurately extract document structures, demonstrating state-of-the-art results especially on forms datasets.
Contribution
The paper presents a novel prior-based deep hierarchical CNN architecture for high-resolution document structure extraction, outperforming existing methods and introducing a new annotated forms dataset.
Findings
Effective high-resolution segmentation with strip-based approach
Outperforms baselines on new forms dataset
Achieves state-of-the-art results on form structure extraction
Abstract
Structure extraction from document images has been a long-standing research topic due to its high impact on a wide range of practical applications. In this paper, we share our findings on employing a hierarchical semantic segmentation network for this task of structure extraction. We propose a prior based deep hierarchical CNN network architecture that enables document structure extraction using very high resolution(1800 x 1000) images. We divide the document image into overlapping horizontal strips such that the network segments a strip and uses its prediction mask as prior for predicting the segmentation of the subsequent strip. We perform experiments establishing the effectiveness of our strip based network architecture through ablation methods and comparison with low-resolution variations. Further, to demonstrate our network's capabilities, we train it on only one type of documents…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
