Data-Efficient Information Extraction from Form-Like Documents
Beliz Gunel, Navneet Potti, Sandeep Tata, James B. Wendt and, Marc Najork, Jing Xie

TL;DR
This paper presents a data-efficient transfer learning approach for extracting information from form-like documents, demonstrating significant improvements with limited labeled data and multi-domain training.
Contribution
It introduces a simple transfer learning method that enhances information extraction accuracy across diverse document types with minimal labeled data.
Findings
Up to 27 F1 point improvement with small training sets using transfer learning.
Additional 8 F1 point gain with multi-domain transfer learning.
Data efficiency and representation learning are key to scaling document information extraction.
Abstract
Automating information extraction from form-like documents at scale is a pressing need due to its potential impact on automating business workflows across many industries like financial services, insurance, and healthcare. The key challenge is that form-like documents in these business workflows can be laid out in virtually infinitely many ways; hence, a good solution to this problem should generalize to documents with unseen layouts and languages. A solution to this problem requires a holistic understanding of both the textual segments and the visual cues within a document, which is non-trivial. While the natural language processing and computer vision communities are starting to tackle this problem, there has not been much focus on (1) data-efficiency, and (2) ability to generalize across different document types and languages. In this paper, we show that when we have only a small…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Topic Modeling · Multimodal Machine Learning Applications
