Visual Information Extraction in the Wild: Practical Dataset and   End-to-end Solution

Jianfeng Kuang; Wei Hua; Dingkang Liang; Mingkun Yang; Deqiang Jiang,; Bo Ren; and Xiang Bai

arXiv:2305.07498·cs.CV·June 16, 2023·2 cites

Visual Information Extraction in the Wild: Practical Dataset and End-to-end Solution

Jianfeng Kuang, Wei Hua, Dingkang Liang, Mingkun Yang, Deqiang Jiang,, Bo Ren, and Xiang Bai

PDF

Open Access 1 Repo

TL;DR

This paper introduces a large-scale, diverse dataset for visual information extraction in real-world scenarios and proposes an end-to-end framework using contrastive learning to improve extraction accuracy.

Contribution

The paper provides a new, more challenging dataset for VIE and develops a novel end-to-end model that effectively bridges the gap between OCR and information extraction tasks.

Findings

01

Existing methods perform worse on the new dataset due to increased complexity.

02

The proposed method achieves consistent performance improvements on both datasets.

03

The dataset better reflects real-world challenges for VIE applications.

Abstract

Visual information extraction (VIE), which aims to simultaneously perform OCR and information extraction in a unified framework, has drawn increasing attention due to its essential role in various applications like understanding receipts, goods, and traffic signs. However, as existing benchmark datasets for VIE mainly consist of document images without the adequate diversity of layout structures, background disturbs, and entity categories, they cannot fully reveal the challenges of real-world applications. In this paper, we propose a large-scale dataset consisting of camera images for VIE, which contains not only the larger variance of layout, backgrounds, and fonts but also much more types of entities. Besides, we propose a novel framework for end-to-end VIE that combines the stages of OCR and information extraction in an end-to-end learning fashion. Different from the previous…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jfkuang/cfam
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques · Handwritten Text Recognition Techniques

MethodsContrastive Learning