RDU: A Region-based Approach to Form-style Document Understanding
Fengbin Zhu, Chao Wang, Wenqiang Lei, Ziyang Liu, Tat Seng Chua

TL;DR
This paper introduces RDU, a novel region-based approach for key information extraction from form-style documents, overcoming sequence tagging limitations by predicting regions in 2D space, and demonstrating strong performance across various document types.
Contribution
The paper proposes a new 2D region prediction framework for KIE, utilizing layout-aware BERT and region proposal modules, enabling flexible training across document types and improving extraction accuracy.
Findings
Achieves impressive results on four types of form-style documents.
Effective in low-resource scenarios with diverse document types.
Outperforms traditional sequence tagging methods.
Abstract
Key Information Extraction (KIE) is aimed at extracting structured information (e.g. key-value pairs) from form-style documents (e.g. invoices), which makes an important step towards intelligent document understanding. Previous approaches generally tackle KIE by sequence tagging, which faces difficulty to process non-flatten sequences, especially for table-text mixed documents. These approaches also suffer from the trouble of pre-defining a fixed set of labels for each type of documents, as well as the label imbalance issue. In this work, we assume Optical Character Recognition (OCR) has been applied to input documents, and reformulate the KIE task as a region prediction problem in the two-dimensional (2D) space given a target field. Following this new setup, we develop a new KIE model named Region-based Document Understanding (RDU) that takes as input the text content and corresponding…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Image Retrieval and Classification Techniques · Vehicle License Plate Recognition
MethodsAttention Is All You Need · Linear Layer · Linear Warmup With Linear Decay · Residual Connection · Dense Connections · Weight Decay · Layer Normalization · WordPiece · Multi-Head Attention · Softmax
