Deep Visual Template-Free Form Parsing

Brian Davis; Bryan Morse; Scott Cohen; Brian Price; Chris Tensmeyer

arXiv:1909.02576·cs.CV·September 20, 2019

Deep Visual Template-Free Form Parsing

Brian Davis, Bryan Morse, Scott Cohen, Brian Price, Chris Tensmeyer

PDF

3 Repos

TL;DR

This paper introduces a deep learning approach for template-free extraction of information from noisy, degraded historical form images, effectively associating text with labels without relying on predefined templates.

Contribution

The authors propose a novel, learned method for detecting and pairing text in degraded forms, along with a new dataset of historical form images for training and validation.

Findings

01

The method outperforms heuristic-based pairing rules.

02

Visual features significantly improve pairing accuracy.

03

Effective on noisy, degraded historical form images.

Abstract

Automatic, template-free extraction of information from form images is challenging due to the variety of form layouts. This is even more challenging for historical forms due to noise and degradation. A crucial part of the extraction process is associating input text with pre-printed labels. We present a learned, template-free solution to detecting pre-printed text and input text/handwriting and predicting pair-wise relationships between them. While previous approaches to this problem have been focused on clean images and clear layouts, we show our approach is effective in the domain of noisy, degraded, and varied form images. We introduce a new dataset of historical form images (late 1800s, early 1900s) for training and validating our approach. Our method uses a convolutional network to detect pre-printed text and input text lines. We pool features from the detection network to classify…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.