GraphRevisedIE: Multimodal Information Extraction with Graph-Revised   Network

Panfeng Cao; Jian Wu

arXiv:2410.01160·cs.IR·October 3, 2024

GraphRevisedIE: Multimodal Information Extraction with Graph-Revised Network

Panfeng Cao, Jian Wu

PDF

1 Repo

TL;DR

GraphRevisedIE is a lightweight multimodal model that uses graph revision and convolution to improve key information extraction from visually rich documents with diverse layouts, demonstrating strong generalization and performance.

Contribution

The paper introduces GraphRevisedIE, a novel graph-based model that effectively integrates multimodal features and global context for improved document information extraction.

Findings

01

Achieves comparable or better performance than previous methods.

02

Generalizes well across diverse document layouts.

03

Publishes a new dataset for KIE research.

Abstract

Key information extraction (KIE) from visually rich documents (VRD) has been a challenging task in document intelligence because of not only the complicated and diverse layouts of VRD that make the model hard to generalize but also the lack of methods to exploit the multimodal features in VRD. In this paper, we propose a light-weight model named GraphRevisedIE that effectively embeds multimodal features such as textual, visual, and layout features from VRD and leverages graph revision and graph convolution to enrich the multimodal embedding with global context. Extensive experiments on multiple real-world datasets show that GraphRevisedIE generalizes to documents of varied layouts and achieves comparable or better performance compared to previous KIE methods. We also publish a business license dataset that contains both real-life and synthesized documents to facilitate research of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

caop-kie/GraphRevisedIE
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsConvolution