GenKIE: Robust Generative Multimodal Document Key Information Extraction
Panfeng Cao, Ye Wang, Qiang Zhang, Zaiqiao Meng

TL;DR
GenKIE is a novel multimodal generative model that improves key information extraction from scanned documents by handling OCR errors and reducing annotation requirements, achieving state-of-the-art results.
Contribution
The paper introduces GenKIE, a generative multimodal model that addresses OCR errors and eliminates the need for token-level annotations in document KIE tasks.
Findings
GenKIE outperforms existing methods on multiple datasets.
The model effectively corrects OCR errors during extraction.
GenKIE demonstrates robustness across diverse document types.
Abstract
Key information extraction (KIE) from scanned documents has gained increasing attention because of its applications in various domains. Although promising results have been achieved by some recent KIE approaches, they are usually built based on discriminative models, which lack the ability to handle optical character recognition (OCR) errors and require laborious token-level labelling. In this paper, we propose a novel generative end-to-end model, named GenKIE, to address the KIE task. GenKIE is a sequence-to-sequence multimodal generative model that utilizes multimodal encoders to embed visual, layout and textual features and a decoder to generate the desired output. Well-designed prompts are leveraged to incorporate the label semantics as the weakly supervised signals and entice the generation of the key information. One notable advantage of the generative model is that it enables…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Text and Document Classification Technologies · Advanced Text Analysis Techniques
