Document Structure Extraction using Prior based High Resolution   Hierarchical Semantic Segmentation

Mausoom Sarkar; Milan Aggarwal; Arneh Jain; Hiresh Gupta; Balaji; Krishnamurthy

arXiv:1911.12170·cs.CV·September 18, 2020

Document Structure Extraction using Prior based High Resolution Hierarchical Semantic Segmentation

Mausoom Sarkar, Milan Aggarwal, Arneh Jain, Hiresh Gupta, Balaji, Krishnamurthy

PDF

TL;DR

This paper introduces a hierarchical semantic segmentation approach using high-resolution images and prior information to accurately extract document structures, demonstrating state-of-the-art results especially on forms datasets.

Contribution

The paper presents a novel prior-based deep hierarchical CNN architecture for high-resolution document structure extraction, outperforming existing methods and introducing a new annotated forms dataset.

Findings

01

Effective high-resolution segmentation with strip-based approach

02

Outperforms baselines on new forms dataset

03

Achieves state-of-the-art results on form structure extraction

Abstract

Structure extraction from document images has been a long-standing research topic due to its high impact on a wide range of practical applications. In this paper, we share our findings on employing a hierarchical semantic segmentation network for this task of structure extraction. We propose a prior based deep hierarchical CNN network architecture that enables document structure extraction using very high resolution(1800 x 1000) images. We divide the document image into overlapping horizontal strips such that the network segments a strip and uses its prediction mask as prior for predicting the segmentation of the subsequent strip. We perform experiments establishing the effectiveness of our strip based network architecture through ablation methods and comparison with low-resolution variations. Further, to demonstrate our network's capabilities, we train it on only one type of documents…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.