Visual Information Extraction in the Wild: Practical Dataset and End-to-end Solution
Jianfeng Kuang, Wei Hua, Dingkang Liang, Mingkun Yang, Deqiang Jiang,, Bo Ren, and Xiang Bai

TL;DR
This paper introduces a large-scale, diverse dataset for visual information extraction in real-world scenarios and proposes an end-to-end framework using contrastive learning to improve extraction accuracy.
Contribution
The paper provides a new, more challenging dataset for VIE and develops a novel end-to-end model that effectively bridges the gap between OCR and information extraction tasks.
Findings
Existing methods perform worse on the new dataset due to increased complexity.
The proposed method achieves consistent performance improvements on both datasets.
The dataset better reflects real-world challenges for VIE applications.
Abstract
Visual information extraction (VIE), which aims to simultaneously perform OCR and information extraction in a unified framework, has drawn increasing attention due to its essential role in various applications like understanding receipts, goods, and traffic signs. However, as existing benchmark datasets for VIE mainly consist of document images without the adequate diversity of layout structures, background disturbs, and entity categories, they cannot fully reveal the challenges of real-world applications. In this paper, we propose a large-scale dataset consisting of camera images for VIE, which contains not only the larger variance of layout, backgrounds, and fonts but also much more types of entities. Besides, we propose a novel framework for end-to-end VIE that combines the stages of OCR and information extraction in an end-to-end learning fashion. Different from the previous…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques · Handwritten Text Recognition Techniques
MethodsContrastive Learning
